HDP Overview Apache Hadoop Essentials

Course Overview

This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led courses.

Target Audience

Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.

Prerequisites

No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet.

Course Objectives

  • Describe the use case for Hadoop
    • Identify Hadoop Ecosystem architectural categories
    • Data Management
    • Data Access
    • Data Governance and Integration
    • Security
    • Operations
  • Detail the HDFS architecture
  • Describe data ingestion options and frameworks for batch and real-time streaming
  • Explain the fundamentals of parallel processing
  • See popular data transformation and processing engines in action
    • Apache Hive
    • Apache Pig
    • Apache Spark
  • Detail the architecture and features of YARN
  • Describe how to secure Hadoop

Course Outline

  • Operational overview with Ambari
  • Loading data into HDFS
  • Data manipulation with Hive
  • Risk Analysis with Pig
  • Risk Analysis with Spark and Zeppelin
  • Securing Hive with Ranger

SLI Main Menu