- COURSE
ClearML Infrastructure Engineer (CLEARML-IE)
Price: $3,995.00
Duration: 5 days
Certification:
Exam:
Continuing Education Credits:
Learning Credits:
This class prepares students for the ClearML Infrastructure Engineer (ClearML-IE) certification. ClearML is an open-source MLOps platform that enables teams to seamlessly track, orchestrate, and scale machine learning workloads across Kubernetes, cloud, and hybrid environments. By the conclusion of this hands-on training, you will return to work with the skills to deploy, secure, and operate a full ClearML environment — from experiment tracking to GPU-powered model serving.
Throughout the course, you will learn to use Helm, Kubernetes, and cloud-native tools to manage ClearML at scale. You’ll configure external data stores, automate agent scaling, integrate with Hugging Face and vLLM, and practice troubleshooting real-world ClearML incidents. The curriculum combines scenario-based labs with production-focused simulations.
Upcoming Class Dates and Times
All Sunset Learning courses are guaranteed to run
- Please Contact Us to request a class date or speak with someone about scheduling options.
Course Outline and Details
Prerequisites
- ClearML Pre-Course Exam
- Linux for Absolute Beginners
- Certified Kubernetes Systems Administrator
- Experience in Enterprise Cloud Platforms
Note: This course requires strong foundations, and multi-disciplinary training. This is an Advanced course which relies on the student's prior experiences to succeed.
Target Audience
- Anyone who plans to Install, Manager, and Operate ClearML Server, Agents, and SDKs
- Any company or individual seeking Certification in ClearML Infrastructure Engineering
- MLOps Product Managers and Systems Administrators
- Any company or individual who wants to advance their knowledge of MLOps
Course Objectives
- Deploy and configure a production-ready ClearML stack using Docker Compose and Helm
- Integrate external data stores (MongoDB, Redis, Elasticsearch, File Server) with secure access
- Configure TLS, ingress routing, and Kubernetes Secrets for ClearML services
- Design and operate scalable ClearML architectures using agents, queues, and autoscaling (HPA)
- Enable GPU-accelerated workloads with NVIDIA or cloud GPU instances (A100, G5, NC-series)
- Implement multi-tenant ClearML environments with RBAC and SSO (OIDC/SAML)
Course Outline
Install ClearML on Kubernetes with Helm
- The Role of ClearML in the MLOps Ecosystem
- Kubernetes Requirements and Ingress Setup
- ClearML Helm Chart Overview and Customization
- Connecting External Stores and Managing Secrets
- Install Kubernetes using Ansible
- Persistent Storage for ClearML
- Prepare Ingress Controller for ClearML
- Deploy ClearML via Helm
Use ClearML for Tracking & Orchestration
- ClearML Architecture: Clients, Server, Agents
- Tasks, Projects, and Artifacts Explained
- ClearML Quickstart: Log a Training Run
- Explore Tasks, Projects, and Artifacts in ClearML
Design ClearML Topology (Stores, Queues, Security)
- ClearML Core Components and Data Flow
- External Data Stores: MongoDB, Elasticsearch, Redis
- Queues, Agents, and Cache Management
- Security Foundations: TLS, OIDC, RBAC
- Explore ClearML Architecture Topology
- ClearML Agent: Kubernetes Glue
- Enable TLS and Validate Secure Endpoints
Deploy Tenant Services (Multi-Tenancy & Web Authentication)
- Understanding Multi-Tenancy in ClearML
- Web Login and Authentication Options (SSO / OIDC / SAML)
- Enable Multi-Tenant Mode via Helm
- Create and Manage Tenants via ClearML SDK
- Map Tenants to Kubernetes Namespaces & Apply Policy
Dynamic GPU Fractions – CDMO
- Concepts & Mental Models (MIG, MPS, Fractions, Profiles)
- Architecture – GPU Operator, Device Plugin, and CDMO Integration
- Installation and Version Pinning
- Scheduling, Validation, and Observability
- Troubleshooting and Recovery
- Prepare GPU Nodes for the ClearML Dynamic MIG Operator (CDMO)
- Deploy the NVIDIA GPU Operator
- Enable CDMO for MIG-Based Fractional GPU Scheduling
ClearML on AWS
- Set up an AWS Organization
- Install Terraform
- Write Terraform IaC
- Launch ClearML with IaC
ClearML on Azure
- Managing Azure with Terraform
- Terraform HCL Syntax
- Initialize Terraform/Azure Integration
- Configure, Deploy, and Control AKS Cluster
- Install ClearML Compatible Storage Class
- Install ClearML Server on Azure