Hadoop - Interview Questions and Answers

31) What are the network requirements for using Hadoop?

Following are the network requirement for using Hadoop:

Password-less SSH connection.
Secure Shell (SSH) for launching server processes.

32) What do you know by storage and compute node?

Storage node: Storage Node is the machine or computer where your file system resides to store the processing data.

Compute Node: Compute Node is a machine or computer where your actual business logic will be executed.

33) Is it necessary to know Java to learn Hadoop?

If you have a background in any programming language like C, C++, PHP, Python, Java, etc. It may be really helpful, but if you are nil in java, it is necessary to learn Java and also get the basic knowledge of SQL.

34) How to debug Hadoop code?

There are many ways to debug Hadoop codes but the most popular methods are:

By using Counters.
By web interface provided by the Hadoop framework.

35) Is it possible to provide multiple inputs to Hadoop? If yes, explain.

Yes, It is possible. The input format class provides methods to insert multiple directories as input to a Hadoop job.

36) What is the relation between job and task in Hadoop?

In Hadoop, A job is divided into multiple small parts known as the task.

37) What is the difference between Input Split and HDFS Block?

The Logical division of data is called Input Split and physical division of data is called HDFS Block.

38) What is the difference between RDBMS and Hadoop?

RDBMS	Hadoop
RDBMS is a relational database management system.	Hadoop is a node based flat structure.
RDBMS is used for OLTP processing.	Hadoop is used for analytical and for big data processing.
In RDBMS, the database cluster uses the same data files stored in shared storage.	In Hadoop, the storage data can be stored independently in each processing node.
In RDBMS, preprocessing of data is required before storing it.	In Hadoop, you don't need to preprocess data before storing it.

39) What is the difference between HDFS and NAS?

HDFS data blocks are distributed across local drives of all machines in a cluster whereas, NAS data is stored on dedicated hardware.

40) What is the difference between Hadoop and other data processing tools?

Hadoop facilitates you to increase or decrease the number of mappers without worrying about the volume of data to be processed.

Interview :: Hadoop