Explain about Apache HBase?

November 30, 2017

Draw Backs of Hadoop:

Hadoop is a open source distributed file system for processing large volumes of data in a sequential manner. But with this sequential manner, if the user want to fetch the data from the last but one row, he need to search the all the rows from the top. It can be done when the data small, but if the data is large it take more time to fetch and process that particular data. To overcome this problem, we need a solution, HBase can provide solution for it .

HBase:

HBase is a open source, highly distributed NO Sql, column oriented database built on the top of Hadoop for processing large volumes of data in random order. One can store data in HDFS either directly or through HDFS. HBase sits on the top of Hadoop system to provide read write access to the users. Data consumers can read/ access the data randomly through HBase. Get more information at big data Hadoop online training. Read more at Big Data Hadoop online Course

Architecture:

In HBase, tables were split into regions and are served by region servers. Regions were vertically divided by column families into stores. Stores contains these files in HDFS

HBase contains three major components namely Master server, client library and Region server. Region servers can be added or removed as per requirements. Let us discuss its architecture in detail.

Mater server:

Assigns regions to the regions servers and takes the help of Apache Zoo keeper for this task. Below are the responsibilities of Master Server:

Coordination of region servers. :

It assigns regions to the new servers and also reassigns the regions to the existing servers .

It monitors all the regions server instances in the cluster

Admin functions : I It is an nterface for creating , inserting and updating the tables in the Data base .

Region server:

Hbase tables are divided horizontally by row key range into regions . A regions contains all rows in the table between regions start key and end key These region server contains regions that has following responsibilites:

Region server communicates with the client and handle data related operations.

Region server Handles read and write request for all the regions under it.

Region Server Decide the size of the region following the region size thresholds.

Features:

Deep integration with Apache Hadoop: As HBase is built on the top of Hadoop; it supports parallelized processing via Map Reduce. HBase can be used as both input and output for Map Reduce jobs .Integration with Apache Hive allow users to query HBase tables using Hive Query Language which is similar to SQL.

Strong Consistency: This project has made strong consistency of reads and writes. A single server in an HBase cluster is responsible for subset of data and with atomic row operations to ensure consistency. Read more at Big Data Hadoop online training

Failure Detection: When the node fails, HBase automatically recovers the write in progress and edits that have not been flushed. It reassigns the region server that was handling the data set , where the node failed.

Real time Queries: By using the configuration bloom filters , block caches and log structured merge trees for efficiently store and query data . It provides random real time access to its data.

Master in Big data through Big Data Hadoop online training Bangalore

Comments

Add comment

Search This Blog

Big Data Hadoop