Menu

HDP Developer Apache Pig and Hive

ALL SLI DATES ARE GUARANTEED TO RUN!

Check out our full list of training locations and learning formats. Please note that the location you choose may be an Established HD-ILT location.

What's Included With This Class?​

365 Day neXT Learning Membership

Video Reference Library

Online Discussion Forums

Tech Talk Webinars

Goal-Based Learning Paths

Your neXT membership includes…

  • A 365 Day neXT Learning Membership is included with the class, giving you access to the below resources. Join thousands of other neXT members in your learning journey!

 

  • Video Reference Library: Thousands of recorded topics, many of which relate to the official technology curriculum, broken down into short, consumable videos. These videos are all on-demand and searchable by subject or course name. Get access to content and recordings from the entire technology stack, not just this class!

 

  • Online Discussion Forums: Technical discussion boards are available for you to interact with SLI instructors, SME’s, and other neXT Learning members. You can leave questions and expect to see quick responses as discussion boards are monitored daily.

 

  • Tech Talk Webinars: SLI hosts a series of technical webinars quarterly. These are virtual, interactive sessions for customers, instructors & SME’s to engage on a variety of topics, driven by our members. Sessions are recorded and archived for future viewing. Session Types: Delta & New Featured Topics, Open Q&A Workshops, Exam Prep & Guidance, Lab Demos. We are always open to new ideas and topics!

 

  • Goal-based Learning Paths: Learning paths are available for members who have a specific end goal in sight. SLI instructors have developed these paths which may contain videos, blogs, articles, or quizzes, combined to help learners meet specific objectives. Example learning paths: CCNA Exam Prep, Scripting for Beginners

Learn More About Our Annual neXT Learning Memberships

Overview

This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core and Spark SQL.
 

Target Audience

Software developers who need to understand and develop applications for Hadoop.
 

Prerequisites

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Course Objectives

  •  Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MaoReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Understand how Hive tables are defined and implemented
  • Use Hive to explore and analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define and schedule an Oozie workflow
  • Present the Spark ecosystem and high-level architecture
  • Perform data analysis with Spark's Resilient Distributed Dataset API
  • Explore Spark SQL and the DataFrame API


Format

50% Lecture/Discussion

50% Hands-on Labs

Full Course Outline

DAY 1 – IN INTRODUCTION TO THE HADOOP DISTRIBUTED FILE SYSTEM

OBJECTIVES
  • Understanding Hadoop
  • The Hadoop Distributed File System
  • Ingesting Data into HDFS
  • The MapReduce Framework
LABS
  • Starting an HDP Cluster
  • Demonstration: Understanding Block Storage
  • Using HDFS Commands
  • Importing RDBMS Data into HDFS
  • Exporting HDFS Data to an RDBMS
  • Importing Log Data into HDFS Using Flume
  • Demonstration: Understanding MapReduce
  • Running a MapReduce Job
DAY 2 – AN INTRODUCTION TO APACHE PIG

OBJECTIVES
  • Introduction to Apache Pig
  • Advanced Apache Pig Programming
LABS
  • Demonstration: Understanding Apache Pig
  • Getting Starting with Apache Pig
  • Exploring Data with Apache Pig
  • Splitting a Dataset
  • Joining Datasets with Apache Pig
  • Preparing Data for Apache Hive
  • Demonstration: Computing Page Rank
  • Analyzing Clickstream Data
  • Analyzing Stock Market Data Using Quantiles
DAY 3 – AN INTRODUCTION TO APACHE HIVE

OBJECTIVES
  • Apache Hive Programming
  • Using HCatalog
  • Advanced Apache Hive Programming
LABS
  • Understanding Hive Tables
  • Understanding Partition and Skew
  • Analyzing Big Data with Apache Hive
  • Demonstration: Computing NGrams
  • Joining Datasets in Apache Hive
  • Computing NGrams of Emails in Avro Format
  • Using HCatalog withApachePig
DAY 4 – WORKING WITH SPARK CORE, SPARK SQL AND OOZIE

OBJECTIVES
  • Advanced Apache Hive Programming (Continued)
  • Hadoop 2 and YARN
  • Introduction to Spark Core and Spark SQL
  • Defining Workflow with Oozie
LABS
  • Advanced Apache Hive Programming
  • Running a YARN Application
  • Getting Started with Apache Spark
  • Exploring Apache Spark SQL
  • Defining an Apache Oozie Workflow
Exclusive Video Included With This Course:​
How to Load Ambari from Scratch
Exclusive Video Included With This Course:​
Configuring Local Repositories
Exclusive Video Included With This Course:​
HDPCD - Big Data Certified Developer Exam Prep
Exclusive Video Included With This Course:​
HDPCA - Big Data Certified Administrator Exam Prep
Exclusive Video Included With This Course:​
Free Open Source Components to Solve Big/”ANY” Data Problems
Exclusive Video Included With This Course:​
Deep Dive: Kafka
SLI Main Menu