This curriculum spans the technical and operational rigor of a multi-workshop capacity advisory engagement, covering modeling, monitoring, and governance practices used in enterprise infrastructure and cloud environments.
Module 1: Foundations of Capacity Planning and Demand Forecasting
- Selecting between time-series forecasting models (e.g., exponential smoothing vs. ARIMA) based on data availability and historical volatility in resource consumption.
- Defining service-level thresholds for performance metrics (e.g., CPU utilization at 75% sustained) to trigger capacity review cycles.
- Integrating business workload calendars (e.g., fiscal closing, marketing campaigns) into seasonal demand models.
- Deciding whether to use peak, average, or percentile-based baselines (e.g., 95th percentile) for capacity projections.
- Establishing data collection intervals (e.g., 5-minute vs. 15-minute polling) to balance monitoring overhead with forecasting accuracy.
- Documenting assumptions in growth models (e.g., 10% YoY increase) and defining triggers for model revalidation.
Module 2: Infrastructure Capacity Modeling and Simulation
- Configuring synthetic workload generators (e.g., JMeter, LoadRunner) to mirror real user transaction patterns across tiers.
- Mapping application transaction paths to underlying infrastructure components for end-to-end capacity tracing.
- Choosing between analytical modeling (e.g., queuing theory) and simulation-based approaches based on system complexity.
- Calibrating simulation models using actual performance data from production environments to reduce variance.
- Modeling the impact of virtualization overhead (e.g., hypervisor CPU steal time) on effective capacity.
- Assessing the scalability limits of stateful vs. stateless components under increasing concurrency.
Module 3: Cloud and Hybrid Resource Sizing Strategies
- Comparing reserved instances vs. on-demand vs. spot instances based on workload predictability and cost-risk tolerance.
- Designing auto-scaling policies that incorporate both utilization thresholds and predictive scaling triggers.
- Accounting for network egress costs and bandwidth constraints when projecting cloud capacity needs.
- Defining scaling boundaries to prevent runaway provisioning due to monitoring anomalies or application bugs.
- Aligning cloud burst strategies with on-premises capacity limits and data residency requirements.
- Implementing tagging and allocation models to attribute cloud spend and capacity usage by business unit.
Module 4: Database and Storage Capacity Engineering
- Estimating growth in transaction logs and tempdb usage under peak OLTP workloads for buffer planning.
- Projecting storage IOPS requirements based on query patterns and indexing strategies.
- Planning for index fragmentation and its impact on storage overhead and performance over time.
- Designing retention and archiving policies for historical data to control database size growth.
- Assessing the impact of compression (row/page, backup) on storage needs and CPU utilization trade-offs.
- Right-sizing SAN/NAS LUNs with consideration for thin vs. thick provisioning and over-subscription ratios.
Module 5: Capacity Monitoring and Performance Data Analysis
- Selecting key performance indicators (KPIs) per tier (e.g., queue depth for storage, response time for app servers).
- Configuring baselining tools to detect anomalies while filtering out scheduled batch processing spikes.
- Correlating infrastructure metrics with application logs to isolate bottlenecks during contention events.
- Managing retention periods for performance data based on compliance, troubleshooting, and modeling needs.
- Normalizing performance data across heterogeneous environments for comparative analysis.
- Validating monitoring agent overhead to ensure data collection does not skew capacity measurements.
Module 6: Capacity Governance and Change Integration
- Enforcing capacity sign-off as part of the change advisory board (CAB) process for major deployments.
- Defining thresholds for capacity exceptions that require formal risk acceptance by stakeholders.
- Integrating capacity impact assessments into project lifecycle documentation for new applications.
- Establishing ownership for capacity reviews across infrastructure, application, and business teams.
- Documenting capacity assumptions in runbooks and handover materials for operational continuity.
- Conducting post-incident reviews to update capacity models after unplanned resource exhaustion.
Module 7: Scalability Testing and Benchmarking
- Designing load test scenarios that reflect real-world user concurrency and data volume growth.
- Isolating database contention during scalability tests by controlling connection pool sizes.
- Measuring diminishing returns in throughput as resources are added (e.g., identifying knee points).
- Validating failover capacity by simulating node loss during peak load conditions.
- Using benchmark results to negotiate SLAs with vendors or cloud providers.
- Archiving test configurations and results for regression comparison across infrastructure upgrades.
Module 8: Long-Term Capacity Roadmapping and Financial Alignment
- Aligning multi-year capacity forecasts with technology refresh cycles and depreciation schedules.
- Presenting capacity options (scale-up vs. scale-out) with TCO implications to financial stakeholders.
- Factoring in lead times for hardware procurement and data center provisioning in expansion plans.
- Negotiating vendor contracts with scalability clauses to accommodate unforeseen demand spikes.
- Modeling the impact of software licensing models (per-core, per-socket, subscription) on capacity decisions.
- Updating capacity roadmaps quarterly based on actual consumption trends and business pivots.