Description

This curriculum spans the technical and operational rigor of a multi-workshop cloud migration advisory engagement, addressing capacity planning with the granularity expected in enterprise-wide infrastructure transformations.

Module 1: Assessing Current State Workloads and Dependencies

Inventorying on-premises applications by transaction volume, peak utilization, and inter-service dependencies using automated discovery tools.
Classifying workloads by criticality, data sensitivity, and migration readiness to prioritize sequencing.
Measuring baseline performance metrics including CPU, memory, disk I/O, and network throughput during business peak cycles.
Identifying monolithic applications with tight coupling that require refactoring before cloud deployment.
Documenting service-level agreements (SLAs) and uptime requirements for each workload to inform cloud architecture decisions.
Validating accuracy of dependency mapping by cross-referencing CMDB data with network flow logs and application performance monitoring tools.

Module 2: Defining Cloud Sizing and Performance Benchmarks

Selecting appropriate cloud instance families based on workload profiles (e.g., compute-optimized vs. memory-optimized).
Establishing performance baselines using cloud-native load testing tools to simulate production traffic patterns.
Adjusting virtual machine configurations based on benchmark results to avoid overprovisioning.
Accounting for cloud-specific performance variables such as burstable instances, network latency between availability zones, and storage IOPS limits.
Comparing on-premises throughput to cloud equivalent performance using standardized application workloads.
Documenting assumptions and constraints in sizing models to support audit and review by infrastructure teams.

Module 3: Right-Sizing and Elasticity Design

Implementing auto-scaling policies based on custom metrics such as queue depth or request latency, not just CPU utilization.
Configuring minimum and maximum instance limits to prevent runaway scaling during traffic anomalies.
Designing scaling schedules for predictable workloads (e.g., month-end reporting) to reduce cold-start delays.
Integrating predictive analytics with historical usage data to anticipate scaling needs ahead of demand spikes.
Setting thresholds for scale-in events to avoid flapping due to transient load reductions.
Validating elasticity behavior under failure conditions, such as AZ outages, to ensure capacity rebalancing works as intended.

Module 4: Storage and Data Tiering Strategy

Mapping database workloads to appropriate storage classes based on access frequency, durability, and throughput requirements.
Estimating growth rates for structured and unstructured data to project storage capacity needs over 12–24 months.
Implementing lifecycle policies to automatically transition cold data to lower-cost storage tiers.
Designing backup and snapshot retention schedules that align with RPOs while minimizing storage spend.
Assessing the impact of eventual consistency in distributed storage systems on application logic.
Validating storage performance under concurrent access patterns typical of the application environment.

Module 5: Network Capacity and Latency Management

Calculating bandwidth requirements for data migration phases, including initial sync and incremental replication.
Provisioning Direct Connect or ExpressRoute circuits with sufficient capacity to handle peak data transfer loads.
Modeling cross-AZ data transfer costs and performance impacts for multi-tier applications.
Configuring DNS routing policies to minimize latency for globally distributed users.
Implementing CDN caching strategies for static assets to reduce origin server load and improve response times.
Monitoring network jitter and packet loss in hybrid environments to identify performance bottlenecks.

Module 6: Cost-Aware Capacity Governance

Implementing tagging policies to allocate cloud resource costs by department, project, and environment.
Using reserved instance and savings plan commitments strategically based on long-term workload stability.
Setting up budget alerts and automated shutdowns for non-production environments during off-hours.
Conducting monthly reviews of idle or underutilized resources for decommissioning or resizing.
Enforcing approval workflows for provisioning high-cost instance types or persistent storage volumes.
Integrating cloud financial management tools with existing ITFM processes for cross-platform reporting.

Module 7: Monitoring, Alerting, and Feedback Loops

Deploying monitoring agents across all migrated workloads to collect granular performance telemetry.
Defining alert thresholds that balance sensitivity with operational noise to prevent alert fatigue.
Correlating infrastructure metrics with business KPIs (e.g., transaction success rate) to assess real-world impact.
Establishing runbooks for common capacity-related incidents, such as disk space exhaustion or CPU throttling.
Conducting post-mortems after scaling events to refine capacity models and alerting rules.
Feeding operational data back into capacity planning cycles to improve forecasting accuracy.

Module 8: Capacity Planning for Multi-Cloud and Hybrid Environments

Developing unified monitoring views across cloud providers to track capacity utilization holistically.
Standardizing instance sizing nomenclature and performance benchmarks across different cloud platforms.
Designing failover capacity in secondary clouds with consideration for provisioning delays and data synchronization lag.
Managing licensing constraints that affect where and how workloads can be deployed across environments.
Coordinating capacity planning cycles with provider-specific maintenance windows and service updates.
Implementing policy-driven placement rules to align workload placement with cost, compliance, and performance objectives.