This curriculum spans the breadth of infrastructure management practices required to align, govern, and optimize shared technology resources across a service portfolio, comparable in scope to a multi-workshop operational transformation program for large-scale IT organizations.
Module 1: Strategic Alignment of Infrastructure with Service Portfolios
- Conducting service dependency mapping to identify critical infrastructure components supporting high-impact business services.
- Establishing service-level objectives (SLOs) for infrastructure based on business-criticality tiers defined in the service portfolio.
- Aligning infrastructure refresh cycles with service lifecycle phases (e.g., retirement, growth) to avoid over-provisioning or under-capacity.
- Defining ownership boundaries between infrastructure teams and service owners during service onboarding into the portfolio.
- Integrating infrastructure constraints into service design approvals to prevent non-viable service commitments.
- Developing a scoring model to prioritize infrastructure investments based on service revenue contribution and operational risk.
Module 2: Infrastructure Capacity and Demand Management Integration
- Implementing telemetry pipelines from infrastructure monitoring tools into service demand forecasting models.
- Setting thresholds for auto-scaling groups based on projected service usage patterns and SLA requirements.
- Managing contention between shared infrastructure resources across multiple services during peak demand periods.
- Allocating reserved capacity for regulated or compliance-bound services with fixed infrastructure requirements.
- Reconciling actual infrastructure utilization against service-level capacity plans during quarterly portfolio reviews.
- Enforcing chargeback or showback mechanisms to influence service teams’ infrastructure consumption behaviors.
Module 3: Resilience and Availability Planning Across Services
- Designing infrastructure redundancy levels (e.g., multi-AZ, geo-redundancy) according to service recovery time objectives (RTOs).
- Coordinating failover testing schedules across interdependent services to minimize business disruption.
- Documenting infrastructure dependencies in runbooks accessible to service operations teams during incident response.
- Balancing cost of high-availability configurations against the financial impact of service downtime.
- Enforcing infrastructure standardization to reduce configuration drift that undermines resilience claims.
- Validating backup integrity and restore procedures for databases supporting mission-critical services annually.
Module 4: Change and Configuration Governance
- Requiring impact assessments for infrastructure changes that affect multiple services in the portfolio.
- Enforcing change freeze periods aligned with peak business cycles for key services (e.g., retail holiday season).
- Integrating CMDB updates into deployment pipelines to maintain accurate service-infrastructural configuration records.
- Approving emergency changes through a time-bound review process while preserving audit trails.
- Automating drift detection on production infrastructure to enforce configuration baselines defined per service.
- Coordinating change advisory board (CAB) meetings with service owners to prioritize infrastructure change windows.
Module 5: Cost Management and Resource Optimization
- Allocating infrastructure costs to services using usage-based metrics (e.g., vCPU-hours, storage GB-month).
- Identifying underutilized infrastructure instances for decommissioning during service rationalization initiatives.
- Applying reserved instance purchasing strategies based on long-term service demand projections.
- Setting budget alerts tied to service-level cost centers to trigger infrastructure optimization reviews.
- Comparing TCO of on-premises vs. cloud-hosted infrastructure for specific service workloads.
- Implementing tagging policies to ensure accurate attribution of cloud spend to service portfolios.
Module 6: Security and Compliance Integration
- Embedding infrastructure security baselines (e.g., CIS benchmarks) into service onboarding checklists.
- Mapping infrastructure controls to regulatory requirements (e.g., PCI, HIPAA) applicable to specific services.
- Conducting infrastructure vulnerability scans on a schedule aligned with service release cycles.
- Restricting administrative access to infrastructure based on service ownership and least privilege.
- Generating audit reports that correlate infrastructure configurations with service compliance obligations.
- Responding to infrastructure-related findings in third-party audits with remediation plans tied to service timelines.
Module 7: Performance Monitoring and Service Feedback Loops
- Defining infrastructure KPIs (e.g., latency, error rates) that directly inform service health dashboards.
- Correlating infrastructure incidents with service degradation events in post-mortem analyses.
- Setting up alerting rules that trigger only when infrastructure issues impact service-level metrics.
- Integrating infrastructure telemetry into service reliability reports for executive review.
- Adjusting monitoring granularity based on service criticality and user impact thresholds.
- Using infrastructure performance trends to influence service capacity planning and architecture changes.
Module 8: Lifecycle Management and Technology Rationalization
- Establishing end-of-life (EOL) policies for infrastructure platforms based on vendor support timelines.
- Coordinating infrastructure migration projects with service deprecation or modernization schedules.
- Assessing technical debt in shared infrastructure components used across multiple services.
- Retiring legacy infrastructure only after confirming all dependent services have been migrated.
- Standardizing on a reduced set of infrastructure technologies to lower operational complexity across the portfolio.
- Documenting migration playbooks for recurring infrastructure upgrade scenarios (e.g., OS patching at scale).