Home / Interview / Hadoop :: General Questions

Interview :: Hadoop

41) What is distributed cache in Hadoop?

Distributed cache is a facility provided by MapReduce Framework. It is provided to cache files (text, archives etc.) at the time of execution of the job. The Framework copies the necessary files to the slave node before the execution of any task at that node.

42) What commands are used to see all jobs running in the Hadoop cluster and kill a job in LINUX?

Hadoop job - list

Hadoop job - kill jobID

43) What is the functionality of JobTracker in Hadoop? How many instances of a JobTracker run on Hadoop cluster?

JobTracker is a giant service which is used to submit and track MapReduce jobs in Hadoop. Only one JobTracker process runs on any Hadoop cluster. JobTracker runs it within its own JVM process.

Functionalities of JobTracker in Hadoop:

  • When client application submits jobs to the JobTracker, the JobTracker talks to the NameNode to find the location of the data.
  • It locates TaskTracker nodes with available slots for data.
  • It assigns the work to the chosen TaskTracker nodes.
  • The TaskTracker nodes are responsible to notify the JobTracker when a task fails and then JobTracker decides what to do then. It may resubmit the task on another node or it may mark that task to avoid.
44) How JobTracker assign tasks to the TaskTracker?

The TaskTracker periodically sends heartbeat messages to the JobTracker to assure that it is alive. This messages also inform the JobTracker about the number of available slots. This return message updates JobTracker to know about where to schedule task.

45) Is it necessary to write jobs for Hadoop in the Java language?

No, There are many ways to deal with non-java codes. HadoopStreaming allows any shell command to be used as a map or reduce function.

46) Which data storage components are used by Hadoop?

HBase data storage component is used by Hadoop.

47) How would you write a custom partitioner in Hadoop job?

For writing a custom partitioner on Hadoop, you must follow the following path:

  • Create a new class that extends Partitioner Class.
  • Override method getPartition() in the wrapper that runs the MapReduce.
  • Add the custom partitioner to the job by using method setPartitioner() or add the custom partitioner in the config file.