Interview :: Hadoop
Following are the network requirement for using Hadoop:
- Password-less SSH connection.
- Secure Shell (SSH) for launching server processes.
Storage node: Storage Node is the machine or computer where your file system resides to store the processing data.
Compute Node: Compute Node is a machine or computer where your actual business logic will be executed.
If you have a background in any programming language like C, C++, PHP, Python, Java, etc. It may be really helpful, but if you are nil in java, it is necessary to learn Java and also get the basic knowledge of SQL.
There are many ways to debug Hadoop codes but the most popular methods are:
- By using Counters.
- By web interface provided by the Hadoop framework.
Yes, It is possible. The input format class provides methods to insert multiple directories as input to a Hadoop job.
In Hadoop, A job is divided into multiple small parts known as the task.
The Logical division of data is called Input Split and physical division of data is called HDFS Block.
RDBMS | Hadoop |
---|---|
RDBMS is a relational database management system. | Hadoop is a node based flat structure. |
RDBMS is used for OLTP processing. | Hadoop is used for analytical and for big data processing. |
In RDBMS, the database cluster uses the same data files stored in shared storage. | In Hadoop, the storage data can be stored independently in each processing node. |
In RDBMS, preprocessing of data is required before storing it. | In Hadoop, you don't need to preprocess data before storing it. |
HDFS data blocks are distributed across local drives of all machines in a cluster whereas, NAS data is stored on dedicated hardware.
Hadoop facilitates you to increase or decrease the number of mappers without worrying about the volume of data to be processed.