ALL DATES GUARANTEED
Check out our full list of training locations and learning formats. Please note that the location you choose may be an Established HD-ILT location with a virtual live instructor.
COURSE DELIVERY OPTIONS
Train face-to-face with the live instructor.
Interact with a live, remote instructor from a specialized, HD-equipped classroom near you.
Attend the live class from the comfort of your home or office.
Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.
- Describe Hadoop, YARN and use cases for Hadoop
- Describe Hadoop ecosystem tools and frameworks
- Describe the HDFS architecture
- Use the Hadoop client to input data into HDFS
- Transfer data between Hadoop and a relational database
- Explain YARN and MaoReduce architectures
- Run a MapReduce job on YARN
- Use Pig to explore and transform data in HDFS
- Understand how Hive tables are defined and implemented
- Use Hive to explore and analyze data sets
- Use the new Hive windowing functions
- Explain and use the various Hive file formats
- Create and populate a Hive table that uses ORC file formats
- Use Hive to run SQL-like queries to perform data analysis
- Use Hive to join datasets using a variety of techniques
- Write efficient Hive queries
- Create ngrams and context ngrams using Hive
- Perform data analytics using the DataFu Pig library
- Explain the uses and purpose of HCatalog
- Use HCatalog with Pig and Hive
- Define and schedule an Oozie workflow
- Present the Spark ecosystem and high-level architecture
- Perform data analysis with Spark's Resilient Distributed Dataset API
- Explore Spark SQL and the DataFrame API
50% Hands-on Labs
Course Outine:DAY 1 – IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM
- Understanding Hadoop
- The Hadoop Distributed File System
- Ingesting Data into HDFS
- The MapReduce Framework
- Starting an HDP Cluster
- Demonstration: Understanding Block Storage
- Using HDFS Commands
- Importing RDBMS Data into HDFS
- Exporting HDFS Data to an RDBMS
- Importing Log Data into HDFS Using Flume
- Demonstration: Understanding MapReduce
- Running a MapReduce Job
- Introduction to Apache Pig
- Advanced Apache Pig Programming
- Demonstration: Understanding Apache Pig
- Getting Starting with Apache Pig
- Exploring Data with Apache Pig
- Splitting a Dataset
- Joining Datasets with Apache Pig
- Preparing Data for Apache Hive
- Demonstration: Computing Page Rank
- Analyzing Clickstream Data
- Analyzing Stock Market Data Using Quantiles
- Apache Hive Programming
- Using HCatalog
- Advanced Apache Hive Programming
- Understanding Hive Tables
- Understanding Partition and Skew
- Analyzing Big Data with Apache Hive
- Demonstration: Computing NGrams
- Joining Datasets in Apache Hive
- Computing NGrams of Emails in Avro Format
- Using HCatalog withApachePig
- Advanced Apache Hive Programming (Continued)
- Hadoop 2 and YARN
- Introduction to Spark Core and Spark SQL
- Defining Workflow with Oozie
- Advanced Apache Hive Programming
- Running a YARN Application
- Getting Started with Apache Spark
- Exploring Apache Spark SQL
- Defining an Apache Oozie Workflow
What's Included With This Class?
This course includes a 365-day membership to our neXT Learning Community! You will join thousands of other neXT members allowing you to interact with other IT professionals, get your questions answered, and achieve your learning goals. Upon registration, you will get immediate access to the following resources:
Join thousands of other members in our neXT Learning Community for an entire year!
Thousands of recorded topics, many of which relate to official technology curriculum.
Interact with instructors and other neXT members. You can expect a quick response as discussion boards are monitored daily.
Virtual, interactive sessions including exam prep , open Q&A workshops, lab demos, and featured exclusive topics.
Learning paths can contain videos, blogs, articles, and quizzes combined to help meet specific objectives.