INSTRUCTOR-LED COURSE

Data Science Primer – Technologies, Tools & Modern Roles in the Data-Driven Enterprise

Course Information

Duration: 1 day

Version: Data Science Overview

Price: $895.00

Certification:

Exam:

Learning Credits:

ALL DATES GUARANTEED

Check out our full list of training locations and learning formats. Please note that the location you choose may be an Established HD-ILT location with a virtual live instructor.

COURSE DELIVERY OPTIONS

Train face-to-face with the live instructor.

Interact with a live, remote instructor from a specialized, HD-equipped classroom near you.​

Attend the live class from the comfort of your home or office.

Register

OVERVIEW

The Data Science Overview | Technologies, Tools & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing. The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment.


This goal of this course is to provide students with a baseline understanding of core concepts that can serve as a platform of knowledge to follow up with more in-depth training and real-world practice.

Prerequisites:

This introductory-level / primer course is an overview intended for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools.

Target Audience:

Attendees should have prior exposure to Enterprise Information Technology. As well as familiarity with Relational Databases.

Course Objectives:

This course provides a high-level view of a variety of core, current data science-related technologies, strategies, skillsets, initiatives, and supporting tools in common business enterprise practices. This list covers a general range of topics current to the time of course distribution. We will collaborate with your team to refine the level of depth of coverage, understand areas of greater importance to your team, where you would like to add demos, etc.


Students will explore:

Foundations: Grids & Virtualization; SOA, ESB / EMB, The Cloud

The Hadoop Ecosystem: HDFS; Resource Navigators, MapReduce, Spark, Distributions

Big Data, NOSQL, and ETL

ETL: Exchange, Transform, Load

Handling Data & a Survey of Useful tools

Enterprise Integration Patterns and Message Busses

Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN

Artificial Intelligence and Business Systems

Who’s on the Team? Evolving Roles and Functions in Data Science

Growing your Infrastructure

Course Outline:

Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We’ll work with you to tune this course and level of coverage to target the skills you need most. Topics, agenda, and labs are subject to change and may adjust during live delivery based on audience needs and skill-level.


Foundations

  • Grids and Virtualization
  • Service-Oriented Architecture
  • Enterprise Service Bus
  • Enterprise Message Bus
  • The Cloud

The Hadoop Ecosystem

  • HDFS: Hadoop Distributed File System
  • Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
  • Hadoop Map/Reduce
  • Spark
  • Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource

Big Data, NOSQL, and ETL

  • Big Data vs. RDBMS
  • NOSQL: Not Only SQL
  • Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
  • Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
  • Columnar Databases: Cassandra, HBase, BigTable
  • Document Databases: MongoDB, CouchDB/CouchBase
  • Graph Databases: Giraph, Neo4J, GraphX
  • Apache Hive
  • Common Data Formats
  • Leveraging SQL and SQL variants

ETL: Exchange, Transform, Load

  • Data Ingestion, Transformation, and Loading
  • Exporting Data
  • Sqoop, Flume, Informatica, and other tools

Enterprise Integration Patterns and Message Busses

  • Enterprise Integration Patterns: Apache Camel and Spring Integration
  • Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools

Developing in Hadoop Ecosystem

  • Languages: R, Python, Java, Scala, Pig, and BPMN
  • Libraries and Frameworks
  • Development, Testing, and Deployment

Artificial Intelligence and Business Systems

  • Artificial Intelligence: Myths, Legends, and Reality
  • The Math
  • Statistics
  • Probability
  • Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
  • Business Rule Systems: Drools, JRules, Pegasus

The Team

  • Agile Data Science
  • NOSQL Data Architects and Administrators
  • Developers
  • Grid Administrators
  • Business and Data Analysts
  • Management
  • Evolving your Team
  • Growing your Infrastructure