This curriculum reflects the scope typically addressed across a full consulting engagement or multi-phase internal transformation initiative.
Module 1: Strategic Assessment of AWS IaaS Adoption
- Evaluate total cost of ownership (TCO) trade-offs between on-premises, colocation, and AWS IaaS across multi-year horizons, factoring in refresh cycles and hidden operational costs.
- Assess organizational readiness for IaaS migration, including skill gaps, change management capacity, and legacy system dependencies.
- Define workload eligibility criteria for AWS based on compliance, data sovereignty, performance, and integration requirements.
- Compare AWS IaaS against alternative cloud models (PaaS, SaaS, hybrid) for specific business units or applications.
- Map business continuity requirements to AWS service capabilities, identifying gaps in RTO/RPO alignment.
- Establish decision frameworks for workload placement: public cloud vs. on-premises based on data gravity, latency, and regulatory constraints.
- Quantify risk exposure from vendor lock-in and design mitigation strategies including multi-cloud abstraction layers.
- Develop governance thresholds for initiating IaaS pilots, including executive sponsorship, budget caps, and exit criteria.
Module 2: Account and Identity Governance at Scale
- Design multi-account AWS Organizations structures aligned with business units, environments, and compliance domains using SCPs and OU hierarchies.
- Implement least-privilege IAM policies with boundary conditions for roles, service-linked roles, and cross-account access.
- Integrate AWS IAM with enterprise identity providers using SAML 2.0 or OpenID Connect, including federation failure fallbacks.
- Enforce MFA and session duration policies across roles, considering usability trade-offs for developers and operators.
- Establish audit trails for IAM changes using AWS CloudTrail with log integrity validation and external monitoring.
- Define lifecycle policies for credential rotation, role deactivation, and access reviews across thousands of identities.
- Implement just-in-time (JIT) access using AWS IAM Roles Anywhere or third-party PAM integrations for privileged operations.
- Assess risks of wildcard permissions and service role proliferation using AWS Access Analyzer and automated remediation.
Module 3: Network Architecture and Connectivity Planning
- Design VPC topologies with CIDR planning, subnets, and route tables to support multi-tier applications and segmentation.
- Implement hybrid connectivity using AWS Direct Connect or Site-to-Site VPN with redundancy, failover, and bandwidth SLAs.
- Configure DNS resolution between on-premises and AWS using Route 53 Resolver inbound/outbound endpoints.
- Enforce network segmentation using security groups, NACLs, and AWS Network Firewall with performance impact analysis.
- Deploy transit gateways for hub-and-spoke architectures, evaluating cost versus peering models at scale.
- Optimize inter-region data transfer costs and latency using AWS Global Accelerator or CloudFront routing policies.
- Plan for IPv6 adoption in VPCs, assessing application compatibility and dual-stack requirements.
- Monitor and troubleshoot network paths using VPC Flow Logs, AWS CloudWatch, and AWS Network Manager.
Module 4: Compute Strategy and Instance Optimization
- Select EC2 instance families based on workload characteristics: compute, memory, storage, or GPU-intensive demands.
- Evaluate trade-offs between On-Demand, Reserved Instances, and Spot Instances for cost and availability requirements.
- Implement auto-scaling groups with predictive and dynamic scaling policies tied to business KPIs.
- Design fault-tolerant workloads across Availability Zones, factoring in stateful versus stateless architecture patterns.
- Integrate EC2 with Systems Manager for patch compliance, inventory tracking, and operational automation.
- Assess containerization readiness and migration paths from EC2 to ECS or EKS for microservices.
- Implement instance hibernation and lifecycle hooks for state preservation during scale-in events.
- Monitor and remediate underutilized or orphaned instances using AWS Cost Explorer and Compute Optimizer.
Module 5: Storage Architecture and Data Management
- Map data access patterns to AWS storage tiers: EBS, EFS, S3, or FSx based on throughput, latency, and durability needs.
- Design EBS volume types (gp3, io2) with IOPS, throughput, and burst balance considerations for database workloads.
- Implement S3 lifecycle policies to transition objects between storage classes and enforce deletion rules.
- Configure cross-region replication and versioning for disaster recovery and ransomware protection.
- Enforce encryption at rest using AWS KMS with customer-managed keys and audit key usage patterns.
- Design backup strategies using AWS Backup with retention, compliance, and restore testing requirements.
- Address data egress costs by caching strategies, data locality, and transfer acceleration decisions.
- Implement data governance with S3 Object Lock, access points, and bucket policies aligned with regulatory frameworks.
Module 6: Security, Compliance, and Threat Mitigation
- Deploy AWS WAF and Shield Advanced to protect internet-facing applications from DDoS and OWASP Top 10 threats.
- Implement host-based security using Amazon GuardDuty, Inspector, and custom log analysis for anomaly detection.
- Enforce encryption in transit using TLS 1.2+ and manage certificates via AWS Certificate Manager with renewal automation.
- Configure Security Hub to aggregate findings and prioritize remediation based on business criticality.
- Implement compliance-as-code using AWS Config rules and automated remediation via AWS Systems Manager.
- Design secure API access patterns using API Gateway with throttling, authentication, and request validation.
- Conduct red-team exercises using AWS services to validate detection and response capabilities.
- Establish incident response workflows integrating AWS CloudTrail, VPC Flow Logs, and SIEM tools.
Module 7: Cost Management and Financial Governance
- Implement cost allocation tags across resources and enforce tagging policies using AWS Service Control Policies.
- Break down AWS spend by department, project, and application using Cost and Usage Reports (CUR) and custom dimensions.
- Forecast monthly expenditures using historical trends and anomaly detection in Cost Explorer.
- Negotiate pricing models with AWS (Savings Plans, Reserved Instances) based on utilization forecasts and commitment risk.
- Identify and eliminate idle or underutilized resources using AWS Trusted Advisor and custom scripts.
- Establish chargeback or showback models using AWS Organizations and third-party tools for internal accountability.
- Monitor reserved instance utilization and coverage gaps to avoid over-procurement or under-reservation.
- Integrate AWS cost data into enterprise financial systems for consolidated reporting and budget control.
Module 8: Operational Resilience and Disaster Recovery
- Define recovery strategies (backup/restore, pilot light, warm standby, multi-site active/active) based on RTO/RPO.
- Test disaster recovery plans using AWS Fault Injection Simulator to validate failover and data consistency.
- Implement automated failover for databases using Amazon RDS Multi-AZ or Aurora Global Database.
- Design immutable backups with write-once-read-many (WORM) policies using S3 Object Lock or AWS Backup.
- Validate data integrity across regions using checksums and automated reconciliation jobs.
- Establish operational runbooks using AWS Systems Manager Automation for common failure scenarios.
- Monitor health of critical workloads using CloudWatch synthetic canaries and custom metrics.
- Conduct post-incident reviews using AWS incident management tools and update runbooks based on findings.
Module 9: Monitoring, Observability, and Performance Tuning
- Design CloudWatch metric and log retention policies balancing cost, compliance, and troubleshooting needs.
- Implement structured logging with ingestion filters and partitioning for high-volume applications.
- Configure custom dashboards with business-relevant KPIs alongside technical metrics.
- Set intelligent alarms using anomaly detection and composite metrics to reduce false positives.
- Trace distributed workloads using AWS X-Ray, analyzing latency bottlenecks and service dependencies.
- Correlate infrastructure metrics with application performance to isolate root causes.
- Optimize log storage costs using S3 lifecycle policies and partitioning strategies.
- Integrate AWS observability tools with third-party APM solutions for centralized visibility.
Module 10: Change Management and IaaS Lifecycle Governance
- Establish change advisory boards (CAB) for high-impact AWS infrastructure modifications.
- Implement infrastructure-as-code (IaC) using AWS CloudFormation or Terraform with peer review and CI/CD pipelines.
- Enforce configuration drift detection and remediation using AWS Config and automated compliance checks.
- Define lifecycle stages for environments (dev, test, prod) with access, retention, and cost controls.
- Manage technical debt in cloud infrastructure by scheduling refactoring and modernization cycles.
- Conduct quarterly architecture reviews using AWS Well-Architected Framework pillars.
- Retire legacy workloads with data migration, access revocation, and cost reallocation planning.
- Document and socialize architectural decision records (ADRs) for key IaaS design choices.