ClearML Systems Administrator (CLEARML-SA)

Price: $2,995.00
Duration: 3 days
Certification: 
Exam: 
Continuing Education Credits:
Learning Credits:

The ClearML Systems Administrator course at Alta3 Research Labs is a definitive, hands-on training program for Systems Administrators and DevOps Engineers tasked with deploying and securing enterprise-grade MLOps environments. Moving beyond basic usage, students dive deep into the Alta3 lab environment—assisted by VIRGIL, our AI Lab Coach—to master the underlying topology of ClearML, including the complex relationships between Servers, Agents, and Kubernetes. You will learn to architect robust data flows, manage external data stores like MongoDB and Elasticsearch, and handle secrets securely, ensuring you understand not just how to use the platform, but how to own the infrastructure that powers it.


A critical focus is placed on transforming standard deployments into secure, multi-tenant ecosystems suitable for large organizations. Participants will master identity integration via OIDC and SSO, implement strict Role-Based Access Control (RBAC), and govern isolated tenants mapped to specific resources. The training culminates in a rigorous Capstone Lab covering "Day-2" operations, where you will deploy Prometheus and Grafana for observability, troubleshoot agent queues, and provision a new tenant end-to-end. By the end of this course, you will possess the specialized skills to architect and maintain a resilient, scalable ClearML platform that satisfies enterprise security and operational standards.

Upcoming Class Dates and Times

All Sunset Learning courses are guaranteed to run

Course Outline and Details

  • System Administrators
  • ClearML Architecture Topology
  • Secure Multi-Tenancy Implementation
  • Identity & Access Governance
  • Kubernetes & Agent Orchestration
  • Observability & System Monitoring
  • Enterprise Operational Troubleshooting

Welcome to ClearML Systems Administrator

  • Exploring Your Lab Environment
  • Meet VIRGIL: Your AI Lab Coach
  • Register for Polls

Core ClearML Platform Operations

  • The Role of ClearML in the MLOps Ecosystem
  • Kubernetes Requirements and Ingress Setup
  • Install Kubernetes using Ansible
  • Persistent Storage for ClearML
  • Prepare Ingress Controller for ClearML
  • ClearML Helm Chart Overview and Customization
  • Deploy ClearML via Helm
  • Connecting External Stores and Managing Secrets

Use ClearML for Tracking & Orchestration

  • ClearML Architecture: Clients, Server, Agents
  • Tasks, Projects, and Artifacts Explained
  • ClearML Quickstart: Log a Training Run
  • Explore Tasks, Projects, and Artifacts in ClearML

Understand ClearML Topology (Stores, Queues, Security)

  • ClearML Core Components and Data Flow
  • External Data Stores: MongoDB, Elasticsearch, Redis
  • Explore ClearML Architecture Topology
  • Queues, Agents, and Cache Management
  • ClearML Agent: Kubernetes Glue
  • Security Foundations: TLS, OIDC, RBAC
  • Enable TLS and Validate Secure Endpoints

Deploy Tenant Services (Multi-Tenancy & Web Authentication)

  • Understanding Multi-Tenancy in ClearML
  • Web Login and Authentication Options (SSO / OIDC / SAML)
  • Enable Multi-Tenant Mode via Helm
  • Create and Manage Tenants via ClearML SDK
  • Map Tenants to Kubernetes Namespaces & Apply Policy

Identity & Authentication (OIDC / SSO / SAML)

  • Identity Provider Integration: OIDC, SAML, and Authentication Flows
  • ClearML Authentication Modes: Internal, External, Hybrid
  • Deploy and Initialize KeyCloak
  • Configure ClearML with OIDC (Keycloak / Azure AD Example)
  • Troubleshooting Authentication & Login Failures

Tenant Administration & Governance

  • Tenant Governance: RBAC, Groups, Permissions, Separation of Data
  • User & Group Lifecycle: Provisioning, Access, Deactivation
  • Inspect Tenant Activity, Audit Logs & Usage Patterns
  • Tenant Lifecycle Best Practices (Provision → Operation → Decommission)

Queue & Agent Operations for Tenants

  • How ClearML Agents Work
  • Queue Design: Shared Queues, Tenant Queues, GPU Queues
  • Create/Modify Queues for Tenants

Dynamic GPU Fractions – CDMO

  • Concepts & Mental Models (MIG, MPS, Fractions, Profiles)
  • Architecture – GPU Operator, Device Plugin, and CDMO Integration
  • Installation and Version Pinning
  • Prepare GPU Nodes for the ClearML Dynamic MIG Operator (CDMO)
  • Deploy the NVIDIA GPU Operator
  • Scheduling, Validation, and Observability
  • Enable CDMO for MIG-Based Fractional GPU Scheduling
  • Troubleshooting and Recovery

ClearML Pipelines

  • Initializing Pipelines with the Python SDK
  • Pipelines: Build a Simple ML Workflow
  • Pipelines: Run Tasks in Parallel
  • Pipelines: Reuse Results with Caching
  • Pipelines: Execute Remote Workloads from GitHub

Course Delivery Options

Train face-to-face with the live instructor. (Please note, not all classes will have this option)
Access to on-demand training content anytime, anywhere. (Please note, not all classes will have this option)
Attend the live class from the comfort of your home or office.
Interact with a live, remote instructor from a specialized, HD-equipped classroom near you. An SLI sales rep will confirm location availability prior to registration confirmation.