This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led courses.
Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.
No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet.
- Describe the use case for Hadoop
- Identify Hadoop Ecosystem architectural categories
- Data Management
- Data Access
- Data Governance and Integration
- Detail the HDFS architecture
- Describe data ingestion options and frameworks for batch and real-time streaming
- Explain the fundamentals of parallel processing
- See popular data transformation and processing engines in action
- Apache Hive
- Apache Pig
- Apache Spark
- Detail the architecture and features of YARN
- Describe how to secure Hadoop
- Operational overview with Ambari
- Loading data into HDFS
- Data manipulation with Hive
- Risk Analysis with Pig
- Risk Analysis with Spark and Zeppelin
- Securing Hive with Ranger