Log On/Register  

855.838.5028

Hadoop for Developers

Duration: 4 Days
Course Price: $2,995

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to Hadoop ecosystem.

Audience

Developers

 

Objectives

· Hadoop & Big Data · HDFS · MapReduce · Pig · Hive · HBase

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to Hadoop ecosystem.

Audience

Developers

 

Objectives

· Hadoop & Big Data · HDFS · MapReduce · Pig · Hive · HBase

Pre-requisites

· comfortable with Java programming language · comfortable in Linux environment (navigating command line, editing files with vi / nano)

Outline

1: Introduction to Hadoop

· hadoop history, concepts

· eco system

· distributions

· high level architecture

· hadoop myths

· hadoop challenges

· hardware / software

· Lab : first look at Hadoop

2: HDFS

· Design and architecture

· concepts (horizontal scaling, replication, data locality, rack awareness)

· Daemons : Namenode, Secondary namenode, Data node

· communications / heart-beats

· data integrity

· read / write path

· Namenode High Availability (HA), Federation

· labs : Interacting with HDFS

 

3 : Map Reduce

· concepts and architecture

· daemons (MRV1) : jobtracker / tasktracker

· phases : driver, mapper, shuffle/sort, reducer

· Map Reduce Version 1 and Version 2 (YARN)

· Internals of Map Reduce

· Introduction to Java Map Reduce program

· labs : Running a sample MapReduce program

 

4 : Pig

· pig vs java map reduce

· pig job flow

· pig latin language

· ETL with Pig

· Transformations & Joins

· User defined functions (UDF)

· labs : writing Pig scripts to analyze data

 

5: Hive

· architecture and design

· data types

· SQL support in Hive

· Creating Hive tables and querying

· partitions

· joins

· text processing

· labs : various labs on processing data with Hive

 

6: HBase

· concepts and architecture

· hbase vs RDBMS vs cassandra

· HBase Java API

· Time series data on HBase

· schema design

· labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise

Learn More
Please type the letters below so we know you are not a robot (upper or lower case):