This curriculum spans the technical and operational rigor of a multi-workshop cloud migration advisory engagement, addressing capacity planning with the granularity expected in enterprise-wide infrastructure transformations.
Module 1: Assessing Current State Workloads and Dependencies
- Inventorying on-premises applications by transaction volume, peak utilization, and inter-service dependencies using automated discovery tools.
- Classifying workloads by criticality, data sensitivity, and migration readiness to prioritize sequencing.
- Measuring baseline performance metrics including CPU, memory, disk I/O, and network throughput during business peak cycles.
- Identifying monolithic applications with tight coupling that require refactoring before cloud deployment.
- Documenting service-level agreements (SLAs) and uptime requirements for each workload to inform cloud architecture decisions.
- Validating accuracy of dependency mapping by cross-referencing CMDB data with network flow logs and application performance monitoring tools.
Module 2: Defining Cloud Sizing and Performance Benchmarks
- Selecting appropriate cloud instance families based on workload profiles (e.g., compute-optimized vs. memory-optimized).
- Establishing performance baselines using cloud-native load testing tools to simulate production traffic patterns.
- Adjusting virtual machine configurations based on benchmark results to avoid overprovisioning.
- Accounting for cloud-specific performance variables such as burstable instances, network latency between availability zones, and storage IOPS limits.
- Comparing on-premises throughput to cloud equivalent performance using standardized application workloads.
- Documenting assumptions and constraints in sizing models to support audit and review by infrastructure teams.
Module 3: Right-Sizing and Elasticity Design
- Implementing auto-scaling policies based on custom metrics such as queue depth or request latency, not just CPU utilization.
- Configuring minimum and maximum instance limits to prevent runaway scaling during traffic anomalies.
- Designing scaling schedules for predictable workloads (e.g., month-end reporting) to reduce cold-start delays.
- Integrating predictive analytics with historical usage data to anticipate scaling needs ahead of demand spikes.
- Setting thresholds for scale-in events to avoid flapping due to transient load reductions.
- Validating elasticity behavior under failure conditions, such as AZ outages, to ensure capacity rebalancing works as intended.
Module 4: Storage and Data Tiering Strategy
- Mapping database workloads to appropriate storage classes based on access frequency, durability, and throughput requirements.
- Estimating growth rates for structured and unstructured data to project storage capacity needs over 12–24 months.
- Implementing lifecycle policies to automatically transition cold data to lower-cost storage tiers.
- Designing backup and snapshot retention schedules that align with RPOs while minimizing storage spend.
- Assessing the impact of eventual consistency in distributed storage systems on application logic.
- Validating storage performance under concurrent access patterns typical of the application environment.
Module 5: Network Capacity and Latency Management
- Calculating bandwidth requirements for data migration phases, including initial sync and incremental replication.
- Provisioning Direct Connect or ExpressRoute circuits with sufficient capacity to handle peak data transfer loads.
- Modeling cross-AZ data transfer costs and performance impacts for multi-tier applications.
- Configuring DNS routing policies to minimize latency for globally distributed users.
- Implementing CDN caching strategies for static assets to reduce origin server load and improve response times.
- Monitoring network jitter and packet loss in hybrid environments to identify performance bottlenecks.
Module 6: Cost-Aware Capacity Governance
- Implementing tagging policies to allocate cloud resource costs by department, project, and environment.
- Using reserved instance and savings plan commitments strategically based on long-term workload stability.
- Setting up budget alerts and automated shutdowns for non-production environments during off-hours.
- Conducting monthly reviews of idle or underutilized resources for decommissioning or resizing.
- Enforcing approval workflows for provisioning high-cost instance types or persistent storage volumes.
- Integrating cloud financial management tools with existing ITFM processes for cross-platform reporting.
Module 7: Monitoring, Alerting, and Feedback Loops
- Deploying monitoring agents across all migrated workloads to collect granular performance telemetry.
- Defining alert thresholds that balance sensitivity with operational noise to prevent alert fatigue.
- Correlating infrastructure metrics with business KPIs (e.g., transaction success rate) to assess real-world impact.
- Establishing runbooks for common capacity-related incidents, such as disk space exhaustion or CPU throttling.
- Conducting post-mortems after scaling events to refine capacity models and alerting rules.
- Feeding operational data back into capacity planning cycles to improve forecasting accuracy.
Module 8: Capacity Planning for Multi-Cloud and Hybrid Environments
- Developing unified monitoring views across cloud providers to track capacity utilization holistically.
- Standardizing instance sizing nomenclature and performance benchmarks across different cloud platforms.
- Designing failover capacity in secondary clouds with consideration for provisioning delays and data synchronization lag.
- Managing licensing constraints that affect where and how workloads can be deployed across environments.
- Coordinating capacity planning cycles with provider-specific maintenance windows and service updates.
- Implementing policy-driven placement rules to align workload placement with cost, compliance, and performance objectives.