This curriculum spans the full lifecycle of IT service management, equivalent in depth to a multi-workshop advisory engagement focused on designing, operating, and improving core ITSM processes across strategy, incident response, change control, and tooling integration.
Module 1: Service Strategy and Portfolio Management
- Define service portfolio boundaries by evaluating existing services against business unit SLAs and decommissioning underutilized offerings based on cost-per-transaction analysis.
- Align service pricing models with chargeback requirements, balancing transparency with internal political constraints in shared-cost environments.
- Conduct demand forecasting for new services using historical usage trends and business roadmap inputs, adjusting capacity plans quarterly.
- Establish a formal service retirement process that includes stakeholder sign-off, data migration validation, and technical deprovisioning steps.
- Implement a service valuation framework to prioritize investment in high-impact services using customer satisfaction and operational cost metrics.
- Integrate portfolio management with enterprise architecture governance to ensure compliance with technology standards and lifecycle alignment.
Module 2: Service Design and SLA Architecture
- Negotiate SLA terms with business units by translating technical availability metrics into business-impact scenarios, such as revenue loss per downtime hour.
- Design multi-tiered SLAs that differentiate service levels for gold, silver, and bronze customer segments based on contractual obligations.
- Map service design specifications to underlying infrastructure components, ensuring monitoring coverage for all critical path dependencies.
- Validate service continuity requirements by conducting tabletop exercises with operations and disaster recovery teams.
- Document service level requirements in a standardized template that includes escalation paths, breach notification procedures, and review cycles.
- Coordinate with legal and compliance teams to ensure SLAs meet regulatory requirements for data residency and auditability.
Module 3: Service Transition and Change Management
- Classify changes using a risk-based model that considers impact, complexity, and rollback feasibility, assigning appropriate approval authorities.
- Enforce a mandatory change advisory board (CAB) process for high-risk changes, with pre-read documentation including backout plans and test results.
- Integrate change management with CI/CD pipelines by requiring change records for production deployments exceeding predefined scope thresholds.
- Conduct post-implementation reviews for failed changes to update risk assessment models and refine approval criteria.
- Manage emergency change windows with strict time limits, mandatory post-mortems, and retroactive CAB reporting.
- Maintain a configuration management database (CMDB) that reflects actual production state through automated discovery and reconciliation processes.
Module 4: Incident Management and Major Event Response
- Implement event correlation rules to suppress noise and identify root incidents from cascading alerts across monitoring tools.
- Define major incident criteria based on business impact, customer count affected, and SLA breach thresholds, triggering escalation protocols automatically.
- Conduct war room coordination using a designated incident manager, documented communication plan, and real-time status dashboard.
- Enforce incident categorization and prioritization using a standardized taxonomy aligned with support team skill sets and response SLAs.
- Integrate incident records with problem management by requiring root cause analysis (RCA) for all P1 incidents within 48 hours.
- Measure incident resolution effectiveness using mean time to acknowledge (MTTA), mean time to resolve (MTTR), and recurrence rates.
Module 5: Problem Management and Root Cause Analysis
- Initiate problem records proactively based on incident trend analysis, such as repeated failures in a specific application module.
- Apply root cause analysis techniques like 5 Whys or fishbone diagrams during cross-functional workshops with engineering and operations teams.
- Prioritize known errors based on frequency, business impact, and feasibility of permanent fixes, feeding outcomes into the change pipeline.
- Track workaround effectiveness and expiration dates to prevent long-term reliance on temporary solutions.
- Integrate problem records with knowledge management by publishing resolutions in a searchable internal knowledge base with version control.
- Conduct monthly problem review meetings to assess backlog aging, resolution rates, and recurring themes across service lines.
Module 6: Configuration Management and CMDB Governance
- Define configuration item (CI) ownership roles and update responsibilities for each business unit and technical domain.
- Implement automated discovery tools with scheduled scans and exception reporting for unauthorized or undocumented CIs.
- Establish data reconciliation processes between discovery tools, change records, and manual inputs to maintain CMDB accuracy.
- Enforce CI lifecycle management by requiring decommission records and audit trails for retired infrastructure.
- Design CI relationships and dependencies to support impact analysis for changes and incidents, validated during major outages.
- Conduct quarterly CMDB health audits measuring completeness, accuracy, and timeliness against a sample of critical services.
Module 7: Service Operation and Continuous Improvement
- Standardize shift handover procedures using structured checklists and incident/pending change summaries for 24/7 operations teams.
- Implement event management workflows that route alerts based on severity, CI criticality, and on-call schedules with escalation timeouts.
- Conduct service reviews with business stakeholders using balanced scorecards that include availability, incident volume, and change success rates.
- Apply continual service improvement (CSI) methodologies by identifying improvement opportunities from incident, problem, and change data trends.
- Measure ITSM process effectiveness using KPIs such as first call resolution rate, change failure rate, and SLA compliance percentage.
- Integrate feedback loops from user satisfaction surveys into process refinement cycles, prioritizing actions based on impact and feasibility.
Module 8: ITSM Tooling and Process Integration
- Select ITSM platforms based on API capabilities, scalability requirements, and integration needs with monitoring, directory, and DevOps tools.
- Design workflow automation for routine tasks such as incident assignment, change approval routing, and SLA timer enforcement.
- Map business service views in the ITSM tool to reflect end-to-end dependencies across applications, databases, and infrastructure layers.
- Implement role-based access controls in the ITSM system to enforce data privacy and segregation of duties for sensitive operations.
- Establish data retention and archiving policies for ITSM records in compliance with legal and audit requirements.
- Conduct integration testing between ITSM and external systems using mock payloads and failure scenario simulations before production rollout.