HDFS File Read Process

To Understand HDFS read and write operation let’s consider below assumptions.

Let’s consider the replication factor as 3. We will have a client machine where Hadoop is installed will all components expect NameNode and DataNode. Not considering Secondary NameNode here.

HDFS Read:

As we know Data is stored in multiple blocks in HDFS based on the replication factor.

  • As NameNode has the address of all blocks, the Client will interact with NameNode to read a file in HDFS.
  • The client will request all the addresses of blocks of a data file to NameNode.
  • Client initiates using open() method. FSDataInputStream in = fs.open(inFile);
  • In response to this NameNode will return the metadata info about blocks including the address of replicas as FSDataInputStream.
  • FSDataInputStream which uses DFSInputStream to take care of the reading the blocks from different nodes.
  • The client will invoke the read method (step 3)  which causes DFSInputStream to connect to the first block and stream the block.
  • The same process (stem 4 and 5) will repeat until all the blocks are covered.

wc5WCRskIj1FAAAAABJRU5ErkJggg==

Leave a Reply