Menu

HDP Developer: Apache Spark 2.3

ALL SLI DATES ARE GUARANTEED TO RUN!

Check out our full list of training locations and learning formats. Please note that the location you choose may be an Established HD-ILT location.

What's Included With This Class?​

365 Day neXT Learning Membership

Video Reference Library

Online Discussion Forums

Tech Talk Webinars

Goal-Based Learning Paths

Your neXT membership includes…

  • A 365 Day neXT Learning Membership is included with the class, giving you access to the below resources. Join thousands of other neXT members in your learning journey!

 

  • Video Reference Library: Thousands of recorded topics, many of which relate to the official technology curriculum, broken down into short, consumable videos. These videos are all on-demand and searchable by subject or course name. Get access to content and recordings from the entire technology stack, not just this class!

 

  • Online Discussion Forums: Technical discussion boards are available for you to interact with SLI instructors, SME’s, and other neXT Learning members. You can leave questions and expect to see quick responses as discussion boards are monitored daily.

 

  • Tech Talk Webinars: SLI hosts a series of technical webinars quarterly. These are virtual, interactive sessions for customers, instructors & SME’s to engage on a variety of topics, driven by our members. Sessions are recorded and archived for future viewing. Session Types: Delta & New Featured Topics, Open Q&A Workshops, Exam Prep & Guidance, Lab Demos. We are always open to new ideas and topics!

 

  • Goal-based Learning Paths: Learning paths are available for members who have a specific end goal in sight. SLI instructors have developed these paths which may contain videos, blogs, articles, or quizzes, combined to help learners meet specific objectives. Example learning paths: CCNA Exam Prep, Scripting for Beginners

Learn More About Our Annual neXT Learning Memberships

Overview

This course introduces the Apache Spark distributed computing engine, and is suitable for developers, data analysts, architects, technical managers, and anyone who needs to use Spark in a hands-on manner. It is based on the Spark 2.x release. The course provides a solid technical introduction to the Spark architecture and how Spark works. It covers the basic building blocks of Spark (e.g. RDDs and the distributed compute engine), as well as higher-level constructs that provide a simpler and more capable interface.It includes in-depth coverage of Spark SQL, DataFrames, and DataSets, which are now the preferred programming API. This includes exploring possible performance issues and strategies for optimization. The course also covers more advanced capabilities such as the use of Spark Streaming to process streaming data, and integrating with the Kafka server.

Target Audience

Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.

Prerequisites

Students should be familiar with programming principles and have previous experience in software development using Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.

Course Objectives

  • Acquire and Install Spark
  • Identify Supported Data Formats
  • Use Broadcast Variables and Accumulators
  • Configure and Create a SparkSession


Full Course Outline

DAY 1: Scala Ramp Up, Introduction to Spark

OBJECTIVES

  • Scala Introduction
  • Working with: Variables, Data Types, and Control Flow
  • The Scala Interpreter
  • Collections and their Standard Methods (e.g. map())
  • Working with: Functions, Methods, and Function Literals
  • Define the Following as they Relate to Scale: Class, Object, and Case Class
  • Overview, Motivations, Spark Systems
  • Spark Ecosystem
  • Spark vs. Hadoop
  • Acquiring and Installing Spark
  • The Spark Shell, SparkContext

LABS

  • Setting Up the Lab Environment
  • Starting the Scala Interpreter
  • A First Look at Spark
  • A First Look at the Spark Shell

 

DAY 2: RDDs and Spark Architecture, Spark SQL, DataFrames and DataSets

OBJECTIVES

  • RDD Concepts, Lifecycle, Lazy Evaluation
  • RDD Partitioning and Transformations
  • Working with RDDs Including: Creating and Transforming
  • An Overview of RDDs
  • SparkSession, Loading/Saving Data, Data Formats
  • Introducing DataFrames and DataSets
  • Identify Supported Data Formats
  • Working with the DataFrame (untyped) Query DSL
  • SQL-based Queries
  • Working with the DataSet (typed) API
  • Mapping and Splitting
  • DataSets vs. DataFrames vs. RDDs

LABS

  • RDD Basics
  • Operations on Multiple RDDs
  • Data Formats
  • Spark SQL Basics
  • DataFrame Transformations
  • The DataSet Typed API
  • Splitting Up Data

 

DAY 3: Shuffling, Transformations and Performance, Performance Tuning

OBJECTIVES

  • Working with: Grouping, Reducing, Joining
  • Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
  • Exploring the Catalyst Query Optimizer
  • The Tungsten Optimizer
  • Discuss Caching, Including: Concepts, Storage Type, Guidelines
  • Minimizing Shuffling for Increased Performance
  • Using Broadcast Variables and Accumulators
  • General Performance Guidelines

LABS

  • Exploring Group Shuffling
  • Seeing Catalyst at Work
  • Seeing Tungsten at Work
  • Working with Caching, Joins, Shuffles, Broadcasts, Accumulators
  • Broadcast General Guidelines

 

DAY 4: Creating Standalone Applications and Spark Streaming

OBJECTIVES

  • Core API, SparkSession.Builder
  • Configuring and Creating a SparkSession
  • Building and Running Applications
  • Application Lifecycle (Driver, Executors, and Tasks)
  • Cluster Managers (Standalone, YARN, Mesos)
  • Logging and Debugging
  • Introduction and Streaming Basics
  • Spark Streaming (Spark 1.0+)
  • Structured Streaming (Spark 2+)
  • Consuming Kafka Data

LABS

  • Spark Job Submission
  • Additional Spark Capabilities
  • Spark Streaming
  • Spark Structured Streaming
  • Spark Structured Streaming with Kafka
Exclusive Video Included With This Course:​
How to Load Ambari from Scratch
Exclusive Video Included With This Course:​
Configuring Local Repositories
Exclusive Video Included With This Course:​
HDPCD - Big Data Certified Developer Exam Prep
Exclusive Video Included With This Course:​
HDPCA - Big Data Certified Administrator Exam Prep
Exclusive Video Included With This Course:​
Free Open Source Components to Solve Big/”ANY” Data Problems
Exclusive Video Included With This Course:​
Deep Dive: Kafka
SLI Main Menu