Bigdata & Hadoop Training
- Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop developer, Hadoop administrator, Hadoop testing, and analytics and this is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules.
- In-depth knowledge of concepts i.e. Hadoop Distributed File System, Hadoop Cluster - Single / Multi node, Hadoop 2.x, Flume, Sqoop, Map-Reduce, PIG, Hive, HBase, Spark,Zookeeper, Oozie etc.
Course Information
Bigdata & Hadoop Course Duration: 30 Hours
Bigdata & Hadoop Training Method: Classroom Training
Bigdata & Hadoop Study Material: Soft Copy
Course Content
Module 1 –Introduction to Data Ware Housing & Business Intelligence
- What is Data Warehouse ?
- Data Warehouse Architecture
- Data Warehouse Vs Data Mart
- OLTP Vs OLAP
- Data Modeling
- Relational
- Dimensional
- Star Schema / Snowflake Schema
- Normalization
- Data Normalization
- Data De-Normalization
- Dimension Table
- Categories – Normal & Confirmed Dimension
- Slowly Changing Dimension - Type 1, Type 2 & Type 3
- Level & Hierarchy
- Fact Table
- Categories - Summary / Aggregations table
- Type
- Additive
- Semi-Additive
- Non-Additive
- Real Time Data ware housing - Change Data capture
- What is Business Intelligence?
Module 2 – Introduction to Big Data & Hadoop
- What is Big Data ?
- Limitations and Solutions of existing Data Analytics Architecture
- Hadoop&Hadoop Features
- Hadoop Ecosystem
- Hadoop 2.x core components
- Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework
- Anatomy of File Write and Read Awareness
Module 3 – Hadoop Architecture,Installation, Setup& Configuration
- Hadoop 2.x Cluster Architecture - Federation and High Availability
- Hadoop Cluster Modes
- Common Hadoop Shell Commands
- Hadoop 2.x Configuration Files
- Hadoop Job Processes
- MapReduce Job Execution
Module 4 –Hadoop MapReduce & YARN Architecture &Framework
- MapReduce Framework
- Traditional way Vs MapReduce way
- Hadoop 2.x MapReduce Architecture&Components
- YARN Architecture, Components &Workflow
- Anatomy of MapReduce Program
- MapReduce Program
Module 5 –Sqoop & Flume
- What is Sqoop ?
- Sqoop Installations and Basics
- Importing Data from RDBMS / MySQL to HDFS
- Exporting Data from HDFS to RDBMS / MySQL
- Parallelism
- Importing data from RDBMS / MySQL to Hive
- What is Flume ?
- Flume Model and Goals
- Features of Flume
Module 6 – Pig
- What is Pig ?
- MapReduce Vs Pig
- Pig Use Cases
- Programming Structure in Pig
- Pig Running Modes
- Pig Components
- Pig Execution
- Pig Data Types
- Relational &Group Operators, File Loaders, Union &Joins, Diagnostic Operators& UDF
Module 7 –Hive
- What is Hive ?
- Hive Vs Pig
- Hive Architecture and Components its Limitations
- Metastore in Hive
- Comparison with Traditional Database
- Hive Data Types, Data Models,Partitions and Buckets
- Hive Tables (Managed Tables and External Tables)
- Importing, Querying Data & Managing Outputs
- Hive Script & UDF
Module 8 –Hbase
- HBase Data Model
- HBase Shell
- HBase Client API
- Data Loading Techniques
- ZooKeeper Data Model
- Zookeeper Service
- Zookeeper
- Data Handling
- HBase Filters
Module 9 –Spark
- What is Spark?
- What is Spark Architecture & Components
- Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning
- Spark Core
- Spark Libraries
- Spark Demo
Module 10 –Big Data & Hadoop 10 Project – Sales Analytics
- Towards the end of the course, you will work on a LIVE project where you will be using Sqoop, Flume, PIG, HIVE, Hbase, MapReduce& Spark to perform Big Data Analytics
- You will use the industry-specific Big Data case studies that are included in our Big Data and Hadoop
- You will gain in-depth experience in working with Hadoop & Big Data
- Understand your sales pipeline and uncover what can lead to successful sales opportunities and better anticipate performance gap
- Review product-related information like Cost, Revenue, Price, etc. across Years and Ordering Method. This dataset could also be used in the Explore feature to better understand the hidden trends & patterns
Training Objectives
- Big Data & Hadoop concepts with HDFS and Map Reduce framework& its Architecture
- Setup Hadoop Cluster and write Complex MapReduce programs
- Learn data loading / Ingestion techniques using Sqoop, Flume&HBase
- Perform data analytics using Pig, Hive and YARN and scheduling jobs using Oozie
- Understand Spark and its Eco system &Learn how to work in RDD in Spark
- Work on a real life Project on Big Data Analytics
Pre-Requisites
- As such, there are no pre-requisites for learning Hadoop, Knowledge of Core Java and SQL Basics will be beneficial, but certainly not a mandate.
- Market for Big Data analytics is growing across the world and this strong growth pattern translates into a great opportunity for all the IT Professionals.
- Here are the few IT Professional, who are continuously enjoying the benefits moving into Big data domain:
- Developers, Java Programmers and Architects
- BI /ETL/DW professionals
- Senior IT Professionals
- Testing professionals
- Mainframe professionals
- Freshers