Log On/Register  

855.838.5028

Advanced Hadoop for Developers

Duration: 4 Days
Course Price: $2,495

Overview

Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course focuses on advanced programming techniques that will be beneficial to experienced Hadoop developers.

Audience

· Developers · Data Architects

 

Objectives

· Advanced Pig · Advanced Hive · Advanced HBase (SQL)

Overview

Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course focuses on advanced programming techniques that will be beneficial to experienced Hadoop developers.

Audience

· Developers · Data Architects

 

Objectives

· Advanced Pig · Advanced Hive · Advanced HBase (SQL)

Pre-requisites

· comfortable with Java programming language (most programming exercises are in java) · comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano) · attended “Hadoop for Developers” or has working knowledge of Hadoop.

Outline

1: Data Management in HDFS

· Various Data Formats (JSON / Avro / Parquet)

· Compression Schemes

· Data Masking

· Labs : Analyzing different data formats; enabling compression

 

2: Advanced Pig

· User-defined Functions

· Introduction to Pig Libraries (ElephantBird / Data-Fu)

· Loading Complex Structured Data using Pig

· Pig Tuning

· Labs : advanced pig scripting, parsing complex data types

 

3 : Advanced Hive

· User-defined Functions

· Compressed Tables

· Hive Performance Tuning

· Labs : creating compressed tables, evaluating table formats and configuration

 

4 : Advanced HBase

· Advanced Schema Modelling

· Compression

· Bulk Data Ingest

· Wide-table / Tall-table comparison

· HBase and Pig

· HBase and Hive

· HBase Performance Tuning

· Labs : tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling

Learn More
Please type the letters below so we know you are not a robot (upper or lower case):