Home / Interview / Hadoop :: General Questions

Interview :: Hadoop

11) What is InputSplit in Hadoop? Explain.

When a Hadoop job runs, it splits input files into chunks and assigns each split to a mapper for processing. It is called the InputSplit.

12) What is TextInputFormat?

In TextInputFormat, each line in the text file is a record. Value is the content of the line while Key is the byte offset of the line. For instance, Key: longWritable, Value: text

13) What is the SequenceFileInputFormat in Hadoop?

In Hadoop, SequenceFileInputFormat is used to read files in sequence. It is a specific compressed binary file format which passes data between the output of one MapReduce job to the input of some other MapReduce job.

14) How many InputSplits is made by a Hadoop Framework?

Hadoop makes 5 splits as follows:

  • One split for 64K files
  • Two splits for 65MB files, and
  • Two splits for 127MB files
15) What is the use of RecordReader in Hadoop?

InputSplit is assigned with a work but doesn't know how to access it. The record holder class is totally responsible for loading the data from its source and convert it into keys pair suitable for reading by the Mapper. The RecordReader's instance can be defined by the Input Format.

16) What is JobTracker in Hadoop?

JobTracker is a service within Hadoop which runs MapReduce jobs on the cluster.

17) What is WebDAV in Hadoop?

WebDAV is a set of extension to HTTP which is used to support editing and uploading files. On most operating system WebDAV shares can be mounted as filesystems, so it is possible to access HDFS as a standard filesystem by exposing HDFS over WebDAV.

18) What is Sqoop in Hadoop?

Sqoop is a tool used to transfer data between the Relational Database Management System (RDBMS) and Hadoop HDFS. By using Sqoop, you can transfer data from RDBMS like MySQL or Oracle into HDFS as well as exporting data from HDFS file to RDBMS.

19) What are the functionalities of JobTracker?

These are the main tasks of JobTracker:

  • To accept jobs from the client.
  • To communicate with the NameNode to determine the location of the data.
  • To locate TaskTracker Nodes with available slots.
  • To submit the work to the chosen TaskTracker node and monitors the progress of each task.
20) Define TaskTracker.

TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations from a JobTracker.