Description

This curriculum spans the technical, governance, and operational practices found in multi-workshop cloud transformation programs, reflecting the depth of advisory engagements focused on establishing cloud-native operations across security, delivery, and cost governance.

Module 1: Strategic Cloud Adoption Frameworks

Define workload eligibility criteria for migration by evaluating legacy system dependencies, compliance constraints, and business continuity requirements.
Select between rehost, refactor, rearchitect, or replace migration patterns based on application technical debt and long-term operational cost projections.
Negotiate cloud service level agreements (SLAs) with providers, balancing uptime guarantees against penalty enforcement mechanisms and exit clauses.
Establish cloud center of excellence (CCoE) governance with cross-functional representation from security, operations, and finance teams.
Implement cloud financial management controls by assigning cost accountability to business units using tagging and chargeback models.
Conduct cloud readiness assessments that evaluate organizational maturity across people, process, and technology dimensions.

Module 2: Cloud-Native Architecture Design

Decompose monolithic applications into bounded-context microservices using domain-driven design (DDD) event storming workshops.
Choose between serverless functions, containers, or managed services based on workload burst patterns and cold-start tolerance.
Design resilient inter-service communication with circuit breakers, retry policies, and distributed tracing integration.
Implement multi-region active-active data replication strategies while managing consistency trade-offs in globally distributed systems.
Select appropriate data storage patterns (e.g., CQRS, event sourcing) based on query complexity and audit requirements.
Integrate API gateways with rate limiting, authentication, and request transformation to standardize external access.

Module 3: Infrastructure as Code and Automation

Standardize Terraform module interfaces across environments to enforce consistent resource tagging and network configuration.
Manage state file locking and backend configuration in shared environments to prevent configuration drift and conflicts.
Implement policy-as-code using Open Policy Agent (OPA) or HashiCorp Sentinel to enforce security and compliance guardrails.
Automate drift detection and remediation workflows using scheduled plan execution and approval gates.
Orchestrate complex deployment topologies across multiple accounts and regions using CI/CD pipelines with dependency mapping.
Version and test infrastructure modules using unit and integration testing frameworks such as Terratest.

Module 4: Cloud Security and Identity Governance

Implement least-privilege IAM roles with just-in-time (JIT) access and time-bound permissions for production environments.
Centralize logging and monitoring by forwarding VPC Flow Logs, CloudTrail, and audit logs to a secure, immutable storage account.
Enforce encryption at rest and in transit using customer-managed keys (CMKs) with key rotation policies and audit trails.
Configure network segmentation using VPC peering, transit gateways, and security group rules with automated compliance checks.
Integrate identity providers (IdPs) with SSO for cloud consoles and APIs using SAML 2.0 or OIDC standards.
Conduct regular permission reviews and access certifications for privileged roles across cloud platforms.

Module 5: DevOps and Continuous Delivery at Scale

Design blue-green or canary deployment pipelines with automated rollback triggers based on health check and metric thresholds.
Manage secrets in CI/CD workflows using short-lived credentials from vaults instead of static API keys.
Standardize container image builds with reproducible pipelines that include vulnerability scanning and SBOM generation.
Enforce deployment windows and change advisory board (CAB) approvals through pipeline policy gates.
Scale build agents dynamically in response to pipeline queue depth while maintaining network egress cost controls.
Integrate performance and load testing into release pipelines for critical transaction paths.

Module 6: Observability and Operational Resilience

Correlate logs, metrics, and traces using a unified tagging schema (e.g., service, environment, request ID) across distributed systems.
Define service level objectives (SLOs) and error budgets to guide incident response and feature development trade-offs.
Configure alerting thresholds using dynamic baselines instead of static values to reduce false positives.
Implement synthetic monitoring for critical user journeys to detect degradation before real users are impacted.
Conduct blameless postmortems with root cause analysis and track remediation items to closure.
Design chaos engineering experiments to validate recovery procedures for zone and region failures.

Module 7: Cost Optimization and Resource Governance

Right-size compute instances using performance telemetry and utilization trends from monitoring tools.
Negotiate reserved instance and savings plan commitments based on historical usage and forecasted growth.
Automate start-stop schedules for non-production workloads using tagging and lifecycle policies.
Identify and decommission orphaned resources such as unattached disks, idle load balancers, and unused snapshots.
Compare total cost of ownership (TCO) between on-premises and cloud for specific workloads using detailed unit economics.
Implement auto-scaling policies that balance performance requirements with cost constraints during traffic spikes.

Module 8: Cloud Operations and Service Management

Integrate cloud operations with existing ITSM tools for incident, problem, and change management workflows.
Define escalation paths and on-call rotations for cloud-native incidents with clear ownership across teams.
Standardize runbooks for common failure scenarios including DNS outages, certificate expirations, and quota breaches.
Automate compliance reporting for regulatory frameworks (e.g., SOC 2, HIPAA) using configuration audit tools.
Manage third-party SaaS integrations with centralized access reviews and API usage monitoring.
Conduct regular cloud architecture reviews to assess alignment with evolving business requirements and technology standards.