This curriculum spans the equivalent of a multi-workshop program focused on integrating virtualization into core ITSM processes, addressing the same level of operational detail found in advisory engagements for service transformation and internal capability builds.
Module 1: Defining Virtualization Scope and Alignment with ITSM Strategy
- Select whether to virtualize compute, storage, or network components first based on existing service bottlenecks and change request volume.
- Map virtualization initiatives to specific ITIL practices such as Incident, Change, and Configuration Management to ensure process integration.
- Determine ownership of virtual assets between infrastructure teams and service owners to prevent accountability gaps in service catalogs.
- Assess compatibility of current CMDB schema with virtual configuration items, including dynamic naming and lifecycle tracking.
- Establish criteria for retiring physical assets post-migration without disrupting service level agreements.
- Define service boundaries for virtual environments that align with business service portfolios, not just technical domains.
Module 2: Virtual Infrastructure Design and Capacity Governance
- Size host clusters based on peak workload demands and anticipated service growth over 18 months, including buffer for unplanned spikes.
- Implement right-sizing policies for virtual machines to prevent over-provisioning and license overruns.
- Allocate shared storage with performance tiering to balance IOPS requirements across critical and non-critical services.
- Design network segmentation for virtual environments using VLANs or micro-segmentation to meet security and compliance mandates.
- Integrate capacity models with service request workflows to auto-validate resource requests against available pools.
- Define thresholds for automated scaling actions and ensure they trigger appropriate event records in the monitoring system.
Module 3: Integration with Change and Release Management
- Classify virtual machine provisioning as standard, normal, or emergency change based on impact and automation level.
- Embed automated configuration checks into change workflows to validate compliance before deployment.
- Require change records for template updates, even when automated, to maintain audit continuity.
- Coordinate release windows for hypervisor patches with application teams to minimize service disruption.
- Enforce peer review for scripts used in infrastructure-as-code deployments to reduce rollback frequency.
- Track rollback success rates for virtual deployments to identify recurring configuration drift issues.
Module 4: Configuration and Asset Management for Dynamic Environments
- Synchronize CMDB updates with provisioning tools using event-driven APIs to reduce stale configuration records.
- Define lifecycle states for virtual machines that include transient states like "paused" or "suspended" for accurate tracking.
- Implement automated discovery tooling with reconciliation rules to handle duplicate or ghost CIs.
- Assign ownership of templates and golden images to designated teams to ensure version control.
- Track software license usage across cloned VMs to avoid compliance exposure during audits.
- Enforce naming conventions that encode environment, function, and owner to support incident triage and reporting.
Module 5: Incident and Problem Management in Virtualized Systems
- Correlate hypervisor-level alerts with application incidents to distinguish infrastructure from service faults.
- Document known errors for recurring VM snapshot failures and link them to resolution knowledge articles.
- Establish escalation paths for resource contention issues that span multiple service teams.
- Configure monitoring tools to suppress redundant alerts during planned host maintenance.
- Use dependency mapping to assess blast radius before initiating live migrations or host reboots.
- Review incident post-mortems to identify patterns in VM sprawl or misconfigured resource pools.
Module 6: Performance Monitoring and Service Level Reporting
- Define SLIs for virtual environments such as VM boot time, host uptime, and storage latency.
- Aggregate performance metrics by business service rather than by host to align reporting with customer impact.
- Set baselines for normal VM behavior to reduce false positives in anomaly detection systems.
- Integrate monitoring data with service dashboards used by service owners and business stakeholders.
- Adjust sampling rates for performance data to balance storage costs with diagnostic resolution.
- Report on resource utilization trends to inform capacity planning and budget requests.
Module 7: Security, Compliance, and Audit Readiness
- Enforce role-based access controls for hypervisor management consoles aligned with least privilege principles.
- Conduct regular access reviews for VM console and snapshot privileges to prevent privilege creep.
- Implement encrypted VMotion and secure boot policies where regulatory standards require data-in-transit protection.
- Generate audit trails for VM cloning and snapshot export activities to detect potential data exfiltration.
- Validate that backup and replication jobs meet RPOs defined in business continuity plans.
- Coordinate vulnerability scans across virtual and physical layers to avoid coverage gaps in compliance reports.
Module 8: Continuous Improvement and Automation Governance
- Measure time-to-provision for virtual machines and target reductions through workflow automation.
- Establish a review board for approving new automation scripts that impact production environments.
- Retire unused VMs based on utilization thresholds and notify owners through automated workflows.
- Integrate feedback from service reviews into template updates and provisioning standards.
- Track automation failure rates and correlate them with change-related incidents.
- Update runbooks to reflect automated recovery procedures and ensure operations teams are trained on override protocols.