This curriculum spans the technical depth and operational breadth of a multi-workshop program focused on enterprise containerization and virtualization, comparable to an internal capability build-out for standardizing cloud-native infrastructure across hybrid environments.
Module 1: Foundations of Virtualization in Enterprise Infrastructure
- Selecting between full virtualization, paravirtualization, and hardware-assisted virtualization based on guest OS compatibility and performance requirements.
- Configuring CPU and memory overcommit ratios in hypervisors while maintaining SLA compliance for critical workloads.
- Implementing NUMA-aware VM placement to avoid remote memory access penalties in multi-socket hosts.
- Designing storage backends for VMs using thin vs. thick provisioning based on IOPS, capacity planning, and snapshot needs.
- Integrating VMs with existing identity providers for console access and audit logging.
- Establishing VM lifecycle policies for patching, decommissioning, and image version control.
- Evaluating Type 1 vs. Type 2 hypervisors in regulated environments with strict isolation requirements.
- Managing VM sprawl through automated tagging, resource quotas, and chargeback mechanisms.
Module 2: Container Architecture and Runtime Design
- Choosing between container runtimes (runc, gVisor, Kata Containers) based on security, performance, and compatibility needs.
- Defining resource limits and requests for CPU and memory in container manifests to prevent noisy neighbor issues.
- Implementing init containers for pre-start dependency checks and configuration validation.
- Configuring container health checks using liveness, readiness, and startup probes with appropriate thresholds.
- Designing multi-stage Dockerfiles to minimize image size and reduce attack surface.
- Managing container UID/GID mappings to prevent privilege escalation on host systems.
- Enforcing seccomp, AppArmor, and SELinux profiles at runtime for defense-in-depth.
- Handling PID and orphaned process management in long-running containerized services.
Module 3: Image Management and Registry Operations
- Designing a multi-tenant container registry hierarchy with project-based access controls and retention policies.
- Implementing image signing using Cosign or Notary to enforce supply chain integrity.
- Automating vulnerability scanning in CI pipelines with tools like Trivy or Clair and defining severity thresholds for blocking.
- Syncing images across geographically distributed registries to reduce pull latency and improve resiliency.
- Creating base image governance policies that mandate patching cadence and owner accountability.
- Managing image metadata through annotations for compliance, ownership, and deployment constraints.
- Configuring registry garbage collection and storage cleanup to avoid disk exhaustion.
- Integrating image promotion workflows with GitOps pipelines using semantic versioning.
Module 4: Orchestration with Kubernetes in Production
- Designing node pools with taints and tolerations to isolate workloads by security level or hardware type.
- Implementing PodDisruptionBudgets to maintain availability during node maintenance or cluster upgrades.
- Configuring custom resource definitions (CRDs) with validation schemas and admission controllers.
- Setting up horizontal and vertical pod autoscaling with metrics from custom Prometheus exporters.
- Managing stateful applications using StatefulSets with persistent volume claims and storage classes.
- Implementing network policies to restrict pod-to-pod communication based on zero-trust principles.
- Using init containers to enforce preconditions before application startup in multi-container pods.
- Planning for etcd backup and restore procedures with regular snapshot testing.
Module 5: Networking Models and Service Connectivity
- Selecting CNI plugins (Calico, Cilium, Flannel) based on network policy enforcement and performance needs.
- Designing service mesh integration using sidecar injection and mTLS for inter-service encryption.
- Configuring ingress controllers with rate limiting, WAF integration, and TLS termination.
- Implementing multi-cluster service discovery using federated DNS or service mesh gateways.
- Managing external access through NodePort, LoadBalancer, or MetalLB in on-prem environments.
- Resolving DNS latency issues by tuning CoreDNS cache settings and upstream resolvers.
- Isolating development, staging, and production traffic using namespace-level network policies.
- Debugging hairpinning and SNAT issues in NAT-heavy environments with custom iptables rules.
Module 6: Persistent Storage and Data Management
- Selecting storage classes (SSD, HDD, NVMe) based on application I/O patterns and cost constraints.
- Implementing dynamic provisioning with CSI drivers for cloud and on-prem storage systems.
- Designing backup and restore workflows for stateful applications using Velero with application consistency hooks.
- Managing access modes (ReadWriteOnce, ReadWriteMany) for shared filesystems in clustered applications.
- Handling volume resizing operations with minimal downtime and application impact.
- Monitoring storage utilization and IOPS to detect misconfigured PVCs or runaway processes.
- Integrating with enterprise storage solutions (NetApp, Pure Storage) using vendor-specific CSI plugins.
- Enforcing data retention and encryption policies at the storage layer for compliance.
Module 7: Security, Compliance, and Runtime Enforcement
- Implementing admission controllers (OPA Gatekeeper, Kyverno) to enforce organizational policies on resource creation.
- Conducting regular node hardening audits using CIS benchmarks and automated scanning tools.
- Managing secrets using external vaults (HashiCorp Vault) with short-lived tokens and rotation policies.
- Enabling audit logging for Kubernetes API server and filtering events based on sensitivity.
- Configuring pod security standards (restricted, baseline, privileged) across namespaces.
- Performing runtime threat detection using Falco or Sysdig to monitor for anomalous process execution.
- Integrating container security into CI/CD with pre-commit hooks and policy-as-code checks.
- Responding to container breakout incidents with host-level containment and forensic collection.
Module 8: Observability and Day 2 Operations
- Deploying distributed tracing for microservices using OpenTelemetry and backend collectors.
- Configuring structured logging pipelines with Fluentd or Vector and enforcing JSON schema compliance.
- Setting up SLOs and error budgets using Prometheus metrics and alerting via Alertmanager.
- Managing log retention and indexing costs by filtering low-value logs at the source.
- Diagnosing performance bottlenecks using container-level CPU, memory, and network profiling.
- Implementing cluster health dashboards with Grafana for infrastructure and application metrics.
- Automating routine operations (node rotation, certificate renewal) using operators and CronJobs.
- Conducting chaos engineering experiments to validate resilience of containerized systems.
Module 9: Hybrid and Multi-Cloud Deployment Strategies
- Designing cluster federation models for workload portability across AWS, Azure, and on-prem environments.
- Managing configuration drift using GitOps tools (ArgoCD, Flux) with environment-specific overlays.
- Implementing hybrid DNS and service discovery to bridge cloud and data center workloads.
- Optimizing cross-cloud data transfer costs using caching, compression, and scheduling.
- Enforcing consistent security policies across clusters using centralized policy engines.
- Handling cloud provider-specific IAM roles and service accounts in multi-cloud Kubernetes.
- Planning for disaster recovery using active-passive cluster configurations and data replication.
- Monitoring cloud spending by namespace and team using cost allocation tools like Kubecost.