Name the most common Input Formats defined in Hadoop? Which one is default?
– TextInputFormat
- KeyValueInputFormat
- SequenceFileInputFormat
TextInputFormat is the Hadoop default.
What is the difference between TextInputFormat and KeyValueInputFormat class?
TextInputFormat: It reads lines of text files and provides the offset of the line as key to the Mapper and actual line as Value to the mapper.
KeyValueInputFormat: Reads text file and
parses lines into key, Val pairs. Everything up to the first tab
character is sent as key to the Mapper and the remainder of the line is
sent as value to the mapper.
What is InputSplit in Hadoop?
When a Hadoop job is run, it splits input files into chunks and assign each split to a mapper to process. This is called InputSplit
What is the purpose of RecordReader in Hadoop?
The InputSplit has defined a slice of work, but does not
describe how to access it. The RecordReader class actually loads the
data from its source and converts it into (key, value) pairs suitable
for reading by the Mapper. The RecordReader instance is defined by the
Input Format.
What is a Combiner?
The Combiner is a ‘mini-reduce’ process which operates only
on data generated by a mapper. The Combiner will receive as input all
data emitted by the Mapper instances on a given node. The output from
the Combiner is then sent to the Reducers, instead of the output from
the Mappers.
How does speculative execution work in Hadoop?
JobTracker makes different TaskTrackers process
same input. When tasks complete, they announce this fact to the
JobTracker. Whichever copy of a task finishes first becomes the
definitive copy. If other copies were executing speculatively, Hadoop
tells the TaskTrackers to abandon the tasks and discard their outputs.
The Reducers then receive their inputs from whichever Mapper completed
successfully, first.
What is JobTracker?
JobTracker is the service within Hadoop that runs MapReduce jobs on the cluster.
What is TaskTracker?
TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations – from a JobTracker.
No comments:
Post a Comment