Hadoop Training
This Hadoop training will start with basic Hadoop introduction like The Hadoop Distributed File System, How MapReduce Works etc. The it will teach you the concept like Joining Data Sets in MapReduce Jobs, Programming Practices & Performance Tuning. You will also learn Hadoop with Analytics using R. Further we will be Delving Deeper Into The Hadoop API, we will learn the concept Hive and Pig in Hadoop. We will also learn Advanced MapReduce Programming.
Few of the clients we have served across industries are:
DHL | PWC | ATOS | TCS | KPMG | Momentive | Tech Mahindra | Kellogg's | Bestseller | ESSAR | Ashok Leyland | NTT Data | HP | SABIC | Lamprell | TSPL | Neovia | NISUM and many more.
MaxMunus has successfully conducted 1000+ corporate training in India, Qatar, Saudi Arabia, Oman, Bangladesh, Bahrain, UAE, Egypt, Jordan, Kuwait, Srilanka, Turkey, Thailand, HongKong, Germany, France, Australia and USA.
Course Information
Hadoop Course Duration: 30 Hours
Hadoop Training Timings: Week days 1-2 Hours per day (or) Weekends: 2-3 Hours per day
Hadoop Training Method: Online/Classroom Training
Hadoop Study Material: Soft Copy
Course Content
The Motivation For Hadoop
- Problems with traditional large-scale systems
- Requirements for a new approach
Hadoop: Basic Concepts
- What is Hadoop?
- The Hadoop Distributed File System
- How MapReduce Works
- Anatomy of a Hadoop Cluster
Joining Data Sets in MapReduce Jobs
- Map-Side Joins
- Reduce-Side Joins
Programming Practices & Performance Tuning
- Developing MapReduce Programs
- Local Mode
- Pseudo-distributed Mode
- Monitoring and debugging on a Production Cluster
- Counters
- Skipping Bad Records
- Rerunning failed tasks with Isolation Runner
- Tuning for Performance
- Reducing network traffic with combiner
- Reducing the amount of input data
- Using Compression
- Reusing the JVM
- Running with speculative execution
- Refactoring code and rewriting algorithms Parameters affecting Performance
- Other Performance Aspects
- Introduction to Big Data analytics
- Use of statistics over big data using R.
- Introduction over R.
- Using R, How to create API which will interact hadoop Ecosystem compoment.
- Integration of Java,R,Hadoop,Hive etc.
Graph Manipulation in Hadoop
- Introduction to graph techniques
- Representing Graphs in Hadoop
- Implementing a sample algorithm: Single Source Shortest Path
Writing a MapReduce Program
- Examining a Sample MapReduce Program
- Basic API Concepts
- The Driver Code
- Anatomy of File Read and Write
- Basic Record Reader Anatomy
- Input and Ouput Format class
- The Mapper
- The Reducer
- Hadoop's Streaming API
Integrating Hadoop Into The Workflow
- Relational Database Management Systems
- Storage Systems
- Importing Data from RDBMSs With Sqoop
- Importing Real-Time Data with Flume
Delving Deeper Into The Hadoop API
- Using Combiners
- The configure and close Methods
- SequenceFiles
- Partitioners
- Custom RecordReader
- Custom Input and Output Class
- Counters
- Directly Accessing HDFS
- Tool Runner
- Using The Distributed Cache
Common MapReduce Algorithms
- Sorting and Searching
- Indexing
- Classification/Machine Learning
- Term Frequency - Inverse Document Frequency
- Word Co-Occurrence
Using Hive and Pig
- Hive Basics
- Pig Basics
Debugging MapReduce Programs
- Testing with MR Unit
- Logging
- Other Debugging Strategies
Advanced MapReduce Programming
- A Recap of the MapReduce Flow
- Custom Writables and WritableComparables
- The Secondary Sort
- Creating InputFormats and OutputFormats
- Pipelining Jobs With Oozie