Architecture and Data Model

HBase Architecture / HBase Data Model

HBase is designed to handle a huge volume of data and it follows Master-Slave design as it is easy to scale across nodes. In this module, we will discuss HBase Architecture and HBase data model. To understand How HBase works, it is necessary to understand HBase Architecture and HBase data model.

The major components as part of HBase Architecture are as below.

  • HMaster
  • RegionServer
  • Region
  • ZooKeeper
  • MemStore
  • Hfile
  • WAL

Let’s discuss HBase architecture and each of the above components in detail .

hbase arch 1


  • HMaster is the Master server in HBase architecture. It manages all RegionServers.
  • This is responsible for monitoring all RegionServers instances in the cluster.
  • HMaster maintains all the metadata changes.
  • Manages load balancing among regions


  • RegionServers carry one or more regions.
  • Manages the splitting regions.
  • Each RegionServer contains WriteAheadLog (WAL).
  • Directly communicates with the Client.
  • WAL: Write Ahead Log file is a distributed file that is used to write before it persists to Hfile. This is useful in case of Node failure
  • Block Cache: This is the cache where most read data is stored in memory. The least used data is removed from memory.
  • MemStore: MemStore is the intermediate store before the final commit to Hfile. The data is sorted before finally persisted in the disk.
  • Hfile: Hfile is the file where actual data stored. MemStore flushes the data to Hfile.

hbase arch 2


  • Regions are the components where actual Column Family data are stored.
  • Regions stores the range of keys from start key to ending key.
  • The default size of the region is 256 MB.
  • Regions stores multiple Hfile.

Hbase Meta Table:

  • Hbase maintains a special table in Zookeeper called the META table.
  • The META table stores information about the location of region servers ad regions in clusters.

HBase Data Model:

Here, we will discuss the Hbase data model and how data is being stored in HBase and its components. Relational databases follow rows and column format and they can be uniquely defined by primary key. But HBase follows a different approach. Each data stored in HBase is comprised of four attributes.

Row Key:

Row Key is the unique identifier for any record. This concept is similar to the primary key in any RDBMS system. Each record will have a Row Key. HBase indexes data using Row Key. It is very fast to retrieve data using Row Key. Row Keys are stored as byte array so Row Keys can be of any data type. Row keys are stored in lexicographical order.

Column Family:

‘Column Family’ is nothing but a logical grouping of columns. The columns in column Family are related to each other.

For example column information related to professional details like employee Id, employee department, employee salary can be grouped into a single column family.

Physically all columns of a column family members are stored together on the file system and each column Family data is stored in a separate file. The reason being similar logical columns are stored is to reduce I/O operation while accessing the similar columns.

Column Family is created when the table has been set up. Once Column Families are created, they cannot be altered or added. So it is always advisable to understand the data first before creating Column Families.


Columns are actual data where data is being stored. Each Column has to associate with one of the column family. There is no restriction about the number of columns in a Column Family. It is not required to define columns as part of the schema as we do in the case of the ‘Column Family’. Hence Columns can be defined on the fly.


The values of the column for a single record id called Cell. This contains the actual value. Apart from the actual value, there is a timestamp attached to the value. This is called a version of the data. If any value edited, then a new timestamp will be associated with the data. So all versions of values will be preserved and only the value with the latest timestamp will be returned as part of the query output.


Combination of Row Key, Column Family, Columns, and Cells will form a row in HBase.

Traditional Database System:

EmpId Name Dept
10 Alex IT
20 Bob Sales
30 Dan IT

HBase Data Model :

Row Key Column Values
1 EmpId 10
1 Name Alex
1 Dept IT
2 EmpId 20
2 Name Bob
2 Dept Sales


In this blog, you learned about the HBase architecture and Hbase data model. Let us know if you have any queries regarding HBase Architecture and HBase data model in the comment section below.

2 thoughts on “Architecture and Data Model

Leave a Reply