Log On/Register  

855.838.5028

Introduction to Data Science – Building Recommender Systems

Duration: 3 Days
Course Price: $2,295

Course Overview

Data scientists build information platforms to ask and answer previously unimaginable questions. Learn how data science helps companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities. Learn iT!'s three-day course helps participants understand what data scientists do and the problems they solve. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and, ultimately, prepare for data scientist roles in the field.

 

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

The role of data scientists, vertical use cases, and business applications of data science

Where and how to acquire data, methods for evaluating source data, and data transformation and preparation

Types of statistics and analytical methods and their relationship

Machine learning fundamentals and breakthroughs, the importance of algorithms, and data as a platform

How to implement and manage recommenders using Apache Mahout and how to set up and evaluate data experiments

Steps for deploying new analytics projects to production and tips for working at scale

 

Data Scientist Certification
Following successful completion of the training class, attendees receive a Data Science Essentials practice test. Data Science Essentials plus the Data Science Challenge constitute the Cloudera Certified Professional: Data Scientist (CCP:DS). Certification is a great differen-tiator; it helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Course Overview

Data scientists build information platforms to ask and answer previously unimaginable questions. Learn how data science helps companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities. Learn iT!'s three-day course helps participants understand what data scientists do and the problems they solve. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and, ultimately, prepare for data scientist roles in the field.

 

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

The role of data scientists, vertical use cases, and business applications of data science

Where and how to acquire data, methods for evaluating source data, and data transformation and preparation

Types of statistics and analytical methods and their relationship

Machine learning fundamentals and breakthroughs, the importance of algorithms, and data as a platform

How to implement and manage recommenders using Apache Mahout and how to set up and evaluate data experiments

Steps for deploying new analytics projects to production and tips for working at scale

 

Data Scientist Certification
Following successful completion of the training class, attendees receive a Data Science Essentials practice test. Data Science Essentials plus the Data Science Challenge constitute the Cloudera Certified Professional: Data Scientist (CCP:DS). Certification is a great differen-tiator; it helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Audience & Prerequisites
This course is suitable for developers, data analysts, and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, and Apache Hive. Students should have proficiency in a scripting language; Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.

 
IntroductionData Science Overview

What Is Data Science?

The Growing Need for Data Science

The Role of a Data Scientist

Use Cases

Finance

Retail

Advertising

Defense and Intelligence

Telecommunications and Utilities

Healthcare and Pharmaceuticals
 
Project Lifecycle

Steps in the Project Lifecycle

Lab Scenario Explanation
 
Data Acquisition

Where to Source Data

Acquisition Techniques
 
Evaluating Input Data

Data Formats

Data Quantity

Data Quality

Data Transformation

Anonymization

File Format Conversion

Joining Datasets
 
Data Analysis and Statistical Methods

Relationship Between Statistics and Probability

Descriptive Statistics

Inferential Statistics

Fundamentals of Machine Learning

Overview

The Three Cs of Machine Learning

Spotlight: Naïve Bayes Classifiers

Importance of Data and Algorithms
 
Recommender Overview

What Is a Recommender System?

Types of Collaborative Filtering

Limitations of Recommender Systems

Fundamental Concepts
 
Introduction to Apache Mahout

What Apache Mahout Is (and Is Not)

A Brief History of Mahout

Availability and Installation

Demonstration: Using Mahout’s Item-Based Recommender
 
Implementing Recommenders with Apache Mahout

Overview

Similarity Metrics for Binary Preferences

Similarity Metrics for Numeric Preferences

Scoring
 
Experimentation and Evaluation

Measuring Recommender Effectiveness

Designing Effective Experiments

Conducting an Effective Experiment

User Interfaces for Recommenders

Production Deployment and Beyond

Deploying to Production

Tips and Techniques for Working at Scale

Summarizing and Visualizing Results

Considerations for Improvement

Next Steps for Recommenders

Conclusion Appendix A : Hadoop OverviewAppendix B: Mathematical FormulasAppendix C : Language and Tool Reference
Learn More
Please type the letters below so we know you are not a robot (upper or lower case):