ClearML Infrastructure Engineer (CLEARML-IE)

Price: $3,995.00
Duration: 5 days
Certification: 
Exam: 
Continuing Education Credits:
Learning Credits:

This class prepares students for the ClearML Infrastructure Engineer (ClearML-IE) certification. ClearML is an open-source MLOps platform that enables teams to seamlessly track, orchestrate, and scale machine learning workloads across Kubernetes, cloud, and hybrid environments. By the conclusion of this hands-on training, you will return to work with the skills to deploy, secure, and operate a full ClearML environment — from experiment tracking to GPU-powered model serving.


Throughout the course, you will learn to use Helm, Kubernetes, and cloud-native tools to manage ClearML at scale. You’ll configure external data stores, automate agent scaling, integrate with Hugging Face and vLLM, and practice troubleshooting real-world ClearML incidents. The curriculum combines scenario-based labs with production-focused simulations.

Upcoming Class Dates and Times

All Sunset Learning courses are guaranteed to run

Course Outline and Details

  • ClearML Pre-Course Exam
  • Linux for Absolute Beginners
  • Certified Kubernetes Systems Administrator
  • Experience in Enterprise Cloud Platforms

Note: This course requires strong foundations, and multi-disciplinary training. This is an Advanced course which relies on the student's prior experiences to succeed.

  • Anyone who plans to Install, Manager, and Operate ClearML Server, Agents, and SDKs
  • Any company or individual seeking Certification in ClearML Infrastructure Engineering
  • MLOps Product Managers and Systems Administrators
  • Any company or individual who wants to advance their knowledge of MLOps
  • Deploy and configure a production-ready ClearML stack using Docker Compose and Helm
  • Integrate external data stores (MongoDB, Redis, Elasticsearch, File Server) with secure access
  • Configure TLS, ingress routing, and Kubernetes Secrets for ClearML services
  • Design and operate scalable ClearML architectures using agents, queues, and autoscaling (HPA)
  • Enable GPU-accelerated workloads with NVIDIA or cloud GPU instances (A100, G5, NC-series)
  • Implement multi-tenant ClearML environments with RBAC and SSO (OIDC/SAML)

Install ClearML on Kubernetes with Helm

  • The Role of ClearML in the MLOps Ecosystem
  • Kubernetes Requirements and Ingress Setup
  • ClearML Helm Chart Overview and Customization
  • Connecting External Stores and Managing Secrets
  • Install Kubernetes using Ansible
  • Persistent Storage for ClearML
  • Prepare Ingress Controller for ClearML
  • Deploy ClearML via Helm

Use ClearML for Tracking & Orchestration

  • ClearML Architecture: Clients, Server, Agents
  • Tasks, Projects, and Artifacts Explained
  • ClearML Quickstart: Log a Training Run
  • Explore Tasks, Projects, and Artifacts in ClearML

Design ClearML Topology (Stores, Queues, Security)

  • ClearML Core Components and Data Flow
  • External Data Stores: MongoDB, Elasticsearch, Redis
  • Queues, Agents, and Cache Management
  • Security Foundations: TLS, OIDC, RBAC
  • Explore ClearML Architecture Topology
  • ClearML Agent: Kubernetes Glue
  • Enable TLS and Validate Secure Endpoints

Deploy Tenant Services (Multi-Tenancy & Web Authentication)

  • Understanding Multi-Tenancy in ClearML
  • Web Login and Authentication Options (SSO / OIDC / SAML)
  • Enable Multi-Tenant Mode via Helm
  • Create and Manage Tenants via ClearML SDK
  • Map Tenants to Kubernetes Namespaces & Apply Policy

Dynamic GPU Fractions – CDMO

  • Concepts & Mental Models (MIG, MPS, Fractions, Profiles)
  • Architecture – GPU Operator, Device Plugin, and CDMO Integration
  • Installation and Version Pinning
  • Scheduling, Validation, and Observability
  • Troubleshooting and Recovery
  • Prepare GPU Nodes for the ClearML Dynamic MIG Operator (CDMO)
  • Deploy the NVIDIA GPU Operator
  • Enable CDMO for MIG-Based Fractional GPU Scheduling

ClearML on AWS

  • Set up an AWS Organization
  • Install Terraform
  • Write Terraform IaC
  • Launch ClearML with IaC

ClearML on Azure

  • Managing Azure with Terraform
  • Terraform HCL Syntax
  • Initialize Terraform/Azure Integration
  • Configure, Deploy, and Control AKS Cluster
  • Install ClearML Compatible Storage Class
  • Install ClearML Server on Azure


Course Delivery Options

Train face-to-face with the live instructor. (Please note, not all classes will have this option)
Access to on-demand training content anytime, anywhere. (Please note, not all classes will have this option)
Attend the live class from the comfort of your home or office.
Interact with a live, remote instructor from a specialized, HD-equipped classroom near you. An SLI sales rep will confirm location availability prior to registration confirmation.