Before understanding the data model of HBase, let’s first understand the difference between Row-Oriented DB and Column-Oriented DB.
- Traditional Database follows a row-oriented design to store the data.
- It is based on a fixed schema and easy to read and write.
- The row and all its column values persisted together in the disk/memory. The Row-Oriented Database is suitable for OLTP systems where you want to insert a bunch of attributes for one record.
- This system is not efficient for aggregation function as the system has to scan the entire table and provide output.
Ex: When we do a table scan, let’s say select column-b from table where column-a=’Something’.
In this case, the query will read every row in the table and looks for the column-b in each row. This involves a lot of I/O costs.
- Column-Oriented Databases stores data in disk column by column. Data is also retrieved in columns.
- This is suitable for OLAP systems.
- Faster access to data during select queries.
- Efficiently perform aggregation because it has to deal with only columns and involves less I/O cost.
Apart from Hbase, MariaDB, Greenplum Database, Cassandra, Apache Kudu etc are few examples of Column Oriented DB.