Log On/Register  

855.838.5028

Jumpstart for Hadoop Administration

Duration: 3 Days
Course Price: $2,095

Course Overview:
This 3-day hands-on Hadoop for System Administrators class is designed for technical operations personnel whose job is to install and maintain production Hadoop clusters in real world. We will cover Hadoop architecture and its components, installation process, monitoring and troubleshooting of the complex Hadoop issues. The class includes practical hands-on exercises and encourages open discussions of how people are using Hadoop in enterprises dealing with large data sets.

Course Objectives:
By the completion of this Hadoop class, the students should be able to:

Understand Hadoop main components and architecture
Be comfortable working with Hadoop Distributed File System
Understand MapReduce abstraction and how it works
Plan your Hadoop cluster
Deploy and administer Hadoop cluster
Optimize Hadoop cluster for the best performance based on specific job requirements
Monitor a Hadoop cluster and execute routine administration procedures
Deal with Hadoop component failures and recoveries
Get familiar with related Hadoop projects: Hbase, Hive and Pig
Know best practices of using Hadoop in enterprise world
Audience:
This course is designed for system administrators and support engineers who will maintain and troubleshoot Hadoop clusters in production or development environments.

Course Overview:
This 3-day hands-on Hadoop for System Administrators class is designed for technical operations personnel whose job is to install and maintain production Hadoop clusters in real world. We will cover Hadoop architecture and its components, installation process, monitoring and troubleshooting of the complex Hadoop issues. The class includes practical hands-on exercises and encourages open discussions of how people are using Hadoop in enterprises dealing with large data sets.

Course Objectives:
By the completion of this Hadoop class, the students should be able to:

Understand Hadoop main components and architecture
Be comfortable working with Hadoop Distributed File System
Understand MapReduce abstraction and how it works
Plan your Hadoop cluster
Deploy and administer Hadoop cluster
Optimize Hadoop cluster for the best performance based on specific job requirements
Monitor a Hadoop cluster and execute routine administration procedures
Deal with Hadoop component failures and recoveries
Get familiar with related Hadoop projects: Hbase, Hive and Pig
Know best practices of using Hadoop in enterprise world
Audience:
This course is designed for system administrators and support engineers who will maintain and troubleshoot Hadoop clusters in production or development environments.

This course is designed for people with at least a basic level of Linux system administration experience. Prior knowledge of Hadoop is not required.

Introduction to Hadoop
The amount of data processing in today’s life
What Hadoop is why it is important
Hadoop comparison with traditional systems
Hadoop history
Hadoop main components and architecture
Hadoop Distributed File System (HDFS)
HDFS overview and design
HDFS architecture
HDFS file storage
Component failures and recoveries
Block placement
Balancing the Hadoop cluster
Planning your Hadoop cluster
Planning a Hadoop cluster and its capacity
Hadoop software and hardware configuration
HDFS Block replication and rack awareness
Network topology for Hadoop cluster
Hadoop Deployment
Different Hadoop deployment types
Hadoop distribution options
Hadoop competitors
Hadoop installation procedure
Distributed cluster architecture
Lab: Hadoop Installation
Working with HDFS
Ways of accessing data in HDFS
Common HDFS operations and commands
Different HDFS commands
Internals of a file read in HDFS
Data copying with ‘distcp’
Lab: Working with HDFS
Map-Reduce Abstraction
What MapReduce is and why it is popular
The Big Picture of the MapReduce
MapReduce process and terminology
MapReduce components failures and recoveries
Working with MapReduce
Hadoop Cluster Configuration
Hadoop configuration overview and important configuration file
Configuration parameters and values
HDFS parameters
MapReduce parameters
Hadoop environment setup
‘Include’ and ‘Exclude’ configuration files
Lab: MapReduce Performance Tuning
Hadoop Administration and Maintenance
Namenode/Datanode directory structures and files
Filesystem image and Edit log
The Checkpoint Procedure
Namenode failure and recovery procedure
Safe Mode
Metadata and Data backup
Potential problems and solutions / What to look for…
Adding and removing nodes
Lab: MapReduce Filesystem Recovery
Hadoop Monitoring and Troubleshooting
Best practices of monitoring a Hadoop cluster
Using logs and stack traces for monitoring and troubleshooting
Using open-source tools to monitor Hadoop cluster
Job Scheduling
How to schedule Hadoop Jobs on the same cluster
Default Hadoop FIFO Schedule
Fair Scheduler and its configuration
Introduction to Hive, HBase and Pig
Hive as a data warehouse infrastructure
HBase as the “Hadoop Database”
Using Pig as a scripting language for Hadoop
Hadoop Case studies
How different organizations use Hadoop cluster in their infrastructure

Learn More
Please type the letters below so we know you are not a robot (upper or lower case):