EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Hadoop Tutorial HBase Architecture
Secondary Sidebar
Hadoop Tutorial
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop?Administrator
    • Hadoop DistCp
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Distributed File System (HDFS)
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop WordCount
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

HBase Architecture

By Swati TawdeSwati Tawde

HBase Architecture

Introduction to HBase Architecture

HBase is an open-source, distributed key-value data storage system and column-oriented database with high write output and low latency random read performance. By using HBase, we can perform online real-time analytics. HBase architecture has strong random readability. In HBase, data is sharded physically into what are known as regions. A single region server hosts each region, and one or more regions are responsible for each region server. The HBase Architecture is composed of master-slave servers. The cluster HBase has one Master node called HMaster and several Region Servers called HRegion Server (HRegion Server). There are multiple regions – regions in each Regional Server.

HDFS Storage Mechanism

HDFS Storage Mechanism

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

In HDFS, Data is stored in the table, as shown above.

Each row has a key.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (85,992 ratings)

Column: It is a collection of data belonging to one column family, which is included inside the row.

Column Family: Each column family consists of one or more columns.

Each table contains a collection of Columns Families. These Columns are not part of the schema.

HBase has Dynamic Columns. Different cells can have other columns because column names are encoded inside the cells.

Column Qualifier: Column name is known as the Column qualifier.

HBase Architecture Components

HBase Architecture Components

There are main elements in the HBase architecture: HMaster and Region Server. Regional HBase Saving Data.

1. HMaster

The HMaster node is lightweight and used for assigning the region to the server region.

There are some main responsibilities of Hmaster, which are:

  • First, carrying out some administration tasks, including loading, balancing, creating data, updating, deletion, etc.

Responsible for changes in the schema or modifications in META data according to the direction of the client application

  • HMaster handles much DDL work on HBase tables.

Some of the methods that HMaster Interface exposes are mainly. META data-oriented methods.

  • Table (create, remove, enable, disable, remove Table)
  • ColumnFamily (add Column, modify Column)
  • Region (move, assign)

The client communicates with both HMaster and ZooKeeper bi-directionally. It contacts HRegion servers directly to read and write operations. HMaster assigns regions to servers in the region and, in turn, checks regional servers ‘ health status.

2. Region Server

We can get a rough idea about the region server by a diagram given below.

HBase Architecture - Region Server

Region Servers are working nodes that handle customers’ requests for reading, writing, updating, and deleting. Region Server is lightweight; it runs at all of the nodes on the cluster Hadoop. The region server’s main task is to save the data in areas and perform customer requests. Another important task of the HBase Region Server is to use the Auto-Sharding method to perform load balancing by dynamically distributing the HBase table when it becomes too large after inserting data.

Multiple HRegion servers can be contacted by HMaster and perform the following functions:

  • Managing and Regions hosting
  • Automatically split regions
  • Handling of requests for reading and writing
  • Direct customer communication

3. HDFS

HDFS stands for the Hadoop Distributed File system. It stores every file in several blocks and replicates blocks across a Hadoop cluster to maintain fault tolerance. HDFS delivers high fault tolerance and works with low-cost materials. Using cheap commodity hardware to add nodes to the cluster and process & save it will give the customer better results than the existing hardware. HDFS contacts the components of HBase and saves a lot of data in a distributed way.

4. Zookeeper

Zookeeper is an open-source project. HMaster and HRegionServers register themselves with ZooKeeper.
It provides various services like maintaining configuration information, naming, providing distributed synchronization, etc. Distributed Synchronization is the process of providing coordination services between nodes to access running applications. It has ephemeral nodes that represent region servers. Master servers use these nodes to search for available servers.

These nodes are also used to track network partitions and server failures. Zookeeper is the interacting medium between the Client region server. If a client wants to communicate with the region server, the zookeeper is the communication medium.

How Search Initializes in HBase Architecture?

As you know, the META table location is saved by Zookeeper. Whenever a customer approaches or writes requests for HBase, the procedure is as follows.

The customer finds out from the ZooKeeper how to place them META table. The client then requests the appropriate row key from them META table to access the region server location. With the META table location, the customer caches this information. The customer shall not refer to them META table until and if the area is moved or shifted. Then the META server will be requested again, and the cache will be updated. As always, customers do not waste time finding the Region Server location on META Server, so it saves time and speeds up the search process.

Features

It is easy to integrate from the source as well as the destination with Hadoop.

The distributed storage, like HDFS, is supported.

It has a random access feature using an internal Hash Table to store data for faster searches in HDFS files.

Advantages and Disadvantages of HBase Architecture

Below are the advantages and disadvantages:

Advantages of HBase Architecture

Below are mentioned the advantages:

  • These can store large data sets.
  • We can share the database.
  • Gigabytes to petabytes cost-effective
  • High availability through replication and failure

Disadvantages of HBase Architecture

Below are mentioned the disadvantages:

  • SQL structure does not support
  • Does not supports transaction
  • Only with key sorted
  • Cluster memory problems

Conclusion

HBase is one of the NoSQL column-oriented distributed databases in apache. While comparing with Hadoop or Hive, HBase performs better for retrieving fewer records. So, in this article, we discussed HBase architecture, and it’s important components.

Recommended Articles

This has been a guide to HBase Architecture. Here we discussed the Concept, Components, Features, Advantages, and Disadvantages. You can also go through our other Suggested Articles to learn more –

  1. What is Big Data Technology?
  2. MongoDB vs HBase
  3. What is Assembly Language?
  4. Introduction to HTM
Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects |  135+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Data Scientist Training (85 Courses, 67+ Projects)4.9
Machine Learning Training (20 Courses, 29+ Projects)4.8
MapReduce Training (2 Courses, 4+ Projects)4.7
1 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more