This curriculum spans the technical and operational breadth of cloud adoption in large-scale software organisations, comparable in scope to a multi-workshop architecture immersion or an internal cloud centre of excellence program.
Module 1: Cloud Infrastructure Selection and Sizing
- Selecting between on-demand, reserved, and spot instances based on workload predictability and cost tolerance.
- Right-sizing virtual machine configurations by analyzing CPU, memory, and I/O utilization patterns from production telemetry.
- Evaluating regional versus availability zone placement for compliance, latency, and fault isolation requirements.
- Implementing storage tiering strategies using object, block, and file storage based on access frequency and durability needs.
- Designing network topology with VPCs, subnets, and routing tables to support multi-tier application architectures.
- Assessing egress costs and data transfer implications when integrating with third-party SaaS platforms.
Module 2: Cloud-Native Application Architecture
- Decomposing monolithic applications into microservices with bounded contexts and independent deployment pipelines.
- Choosing between serverless functions and containerized services based on cold start sensitivity and execution duration.
- Implementing circuit breakers and retry policies in inter-service communication to handle transient failures.
- Designing stateless services with externalized session storage to support horizontal scaling.
- Integrating service mesh components for observability, mTLS, and traffic control in multi-service environments.
- Managing configuration drift by externalizing environment-specific settings into centralized configuration stores.
Module 3: Identity, Access, and Security Governance
- Enforcing least-privilege access using IAM roles and policies tied to service identities rather than long-lived credentials.
- Implementing multi-factor authentication and conditional access policies for administrative console access.
- Rotating secrets and API keys using automated secret management tools integrated into deployment workflows.
- Configuring audit logging and monitoring for unauthorized access attempts across cloud resources.
- Applying security group and network ACL rules to restrict inter-service communication to required ports and protocols.
- Managing cross-account access for shared services using role assumption and organizational units.
Module 4: Data Management and Persistence in the Cloud
- Selecting managed database services based on consistency, scalability, and operational overhead trade-offs.
- Designing backup and point-in-time recovery strategies for databases with regulatory retention requirements.
- Implementing read replicas and sharding to handle high-read and high-write workloads.
- Encrypting data at rest and in transit using customer-managed or cloud provider key management systems.
- Handling data residency and sovereignty by restricting storage and processing to approved geographic regions.
- Migrating large datasets between cloud environments using offline transfer appliances or optimized bulk services.
Module 5: CI/CD and DevOps Automation
- Designing immutable infrastructure pipelines that rebuild and redeploy artifacts instead of in-place updates.
- Integrating security scanning tools into CI pipelines to detect vulnerabilities before deployment.
- Implementing blue-green or canary deployments with automated rollback triggers based on health metrics.
- Managing infrastructure as code using version-controlled templates with peer review and drift detection.
- Orchestrating cross-environment promotions with manual approval gates for production changes.
- Enforcing pipeline concurrency limits to prevent resource contention during parallel deployments.
Module 6: Monitoring, Observability, and Incident Response
- Instrumenting applications with structured logging and distributed tracing to diagnose latency bottlenecks.
- Defining service level objectives and error budgets to guide incident prioritization and release pacing.
- Configuring alerting thresholds to minimize noise while ensuring critical failures trigger immediate response.
- Correlating metrics, logs, and traces across services to identify root causes during outages.
- Simulating failure scenarios using chaos engineering to validate system resilience.
- Integrating monitoring data with incident management platforms for escalation and post-mortem tracking.
Module 7: Cost Management and Resource Optimization
- Tagging resources with cost centers, environments, and owners to enable granular chargeback reporting.
- Identifying underutilized instances and idle resources using cloud-native cost analysis tools.
- Negotiating enterprise discount plans after establishing baseline usage and forecasting growth.
- Implementing auto-scaling policies that balance performance SLAs with cost efficiency.
- Archiving cold data to lower-cost storage tiers with retrieval time trade-offs.
- Conducting regular cost reviews with engineering teams to align spending with business value.
Module 8: Hybrid and Multi-Cloud Integration Patterns
- Establishing secure, low-latency connectivity between on-premises data centers and cloud VPCs using dedicated links.
- Synchronizing identity directories across cloud and on-premises environments using federation protocols.
- Designing data replication strategies for hybrid databases with conflict resolution mechanisms.
- Standardizing deployment tooling across multiple cloud providers to reduce operational complexity.
- Managing vendor lock-in risks by abstracting cloud-specific services behind façade interfaces.
- Enforcing consistent security policies across cloud environments using centralized policy-as-code frameworks.