Course Information

Bigdata & Hadoop Course Duration: 30 Hours

Bigdata & Hadoop Training Method: Classroom Training

Bigdata & Hadoop Study Material: Soft Copy

Course Content

Module 1 –Introduction to Data Ware Housing & Business Intelligence

  • What is Data Warehouse ?
  • Data Warehouse Architecture
  • Data Warehouse Vs Data Mart
  • OLTP Vs OLAP
  • Data Modeling
    • Relational
    • Dimensional 
      • Star Schema / Snowflake Schema
  • Normalization
    • Data Normalization
    • Data De-Normalization
  • Dimension Table
    • Categories – Normal & Confirmed Dimension
    • Slowly Changing Dimension - Type 1, Type 2 & Type 3
    • Level & Hierarchy
  • Fact Table
    • Categories - Summary / Aggregations table
    • Type
    • Additive
    • Semi-Additive
    • Non-Additive
  • Real Time Data ware housing - Change Data capture
  • What is Business Intelligence?

Module 2 – Introduction to Big Data & Hadoop

  • What is Big Data ?
  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop&Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS, Hadoop Processing: MapReduce Framework
  • Anatomy of File Write and Read Awareness

Module 3 – Hadoop Architecture,Installation, Setup& Configuration

  • Hadoop 2.x Cluster Architecture - Federation and High Availability
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Hadoop Job Processes
  • MapReduce Job Execution

Module 4 –Hadoop MapReduce & YARN Architecture &Framework

  • MapReduce Framework
  • Traditional way Vs MapReduce way
  • Hadoop 2.x MapReduce Architecture&Components
  • YARN Architecture, Components &Workflow
  • Anatomy of MapReduce Program
  • MapReduce Program 

Module 5 –Sqoop & Flume

  • What is Sqoop ?
  • Sqoop Installations and Basics
  • Importing Data from RDBMS / MySQL to HDFS
  • Exporting Data from HDFS to RDBMS / MySQL
  • Parallelism
  • Importing data from RDBMS / MySQL to Hive
  • What is Flume ?
  • Flume Model and Goals
  • Features of Flume

Module 6 – Pig

  • What is Pig ?
  • MapReduce Vs Pig
  • Pig Use Cases
  • Programming Structure in Pig
  • Pig Running Modes
  • Pig Components
  • Pig Execution
  • Pig Data Types
  • Relational &Group Operators, File Loaders, Union &Joins, Diagnostic Operators& UDF

Module 7 –Hive

  • What is Hive ?
  • Hive Vs Pig
  • Hive Architecture and Components its Limitations
  • Metastore in Hive
  • Comparison with Traditional Database
  • Hive Data Types, Data Models,Partitions and Buckets
  • Hive Tables (Managed Tables and External Tables)
  • Importing, Querying Data & Managing Outputs
  • Hive Script & UDF

Module 8 –Hbase

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • ZooKeeper Data Model
  • Zookeeper Service
  • Zookeeper
  • Data Handling
  • HBase Filters

Module 9 –Spark

  • What is Spark?
  • What is Spark Architecture & Components
  • Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning
  • Spark Core
  • Spark Libraries
  • Spark Demo

Module 10 –Big Data & Hadoop 10 Project – Sales Analytics

  • Towards the end of the course, you will work on a LIVE project where you will be using Sqoop, Flume, PIG, HIVE, Hbase, MapReduce& Spark to perform Big Data Analytics
  • You will use the industry-specific Big Data case studies that are included in our Big Data and Hadoop
  • You will gain in-depth experience in working with Hadoop & Big Data
  • Understand your sales pipeline and uncover what can lead to successful sales opportunities and better anticipate performance gap
  • Review product-related information like Cost, Revenue, Price, etc. across Years and Ordering Method. This dataset could also be used in the Explore feature to better understand the hidden trends & patterns

 

Training Objectives

  • Big Data & Hadoop concepts with HDFS and Map Reduce framework& its Architecture
  • Setup Hadoop Cluster and write Complex MapReduce programs
  • Learn data loading / Ingestion techniques using Sqoop, Flume&HBase
  • Perform data analytics using Pig, Hive and YARN and scheduling jobs using Oozie
  • Understand Spark and its Eco system &Learn how to work in RDD in Spark
  • Work on a real life Project on Big Data Analytics
 

Pre-Requisites

  • As such, there are no pre-requisites for learning Hadoop, Knowledge of Core Java and SQL Basics will be beneficial, but certainly not a mandate. 
  • Market for Big Data analytics is growing across the world and this strong growth pattern translates into a great opportunity for all the IT Professionals. 
  • Here are the few IT Professional, who are continuously enjoying the benefits moving into Big data domain:
    • Developers, Java Programmers and Architects
    • BI /ETL/DW professionals
    • Senior IT Professionals
    • Testing professionals
    • Mainframe professionals
    • Freshers

Request For Demo