Description

This curriculum spans the technical and organisational complexity of a multi-workshop internal capability program, addressing the same pipeline optimisation, cross-team coordination, and platform governance challenges seen in large-scale DevOps transformations.

Module 1: Accelerating Code Integration and Build Pipelines

Selecting between monorepo and polyrepo strategies based on team autonomy, dependency management, and CI pipeline scalability.
Implementing incremental builds using artifact caching and dependency version pinning to reduce redundant compilation steps.
Configuring parallel test execution across multiple environments while managing test data isolation and resource contention.
Enforcing pre-commit hooks and automated linting rules to prevent integration failures without blocking developer velocity.
Integrating build health metrics into developer dashboards to provide real-time feedback on pipeline performance.
Optimizing container image layering and registry pull strategies to minimize build time in distributed CI agents.

Module 2: Streamlining Deployment Automation at Scale

Choosing between blue-green, canary, and rolling deployments based on risk tolerance, monitoring capabilities, and rollback requirements.
Designing idempotent deployment scripts to ensure consistency across repeated execution in complex environments.
Managing stateful service deployments (e.g., databases) alongside stateless components without introducing deployment bottlenecks.
Implementing deployment gates using automated policy checks (security, compliance, performance) without creating manual approval delays.
Coordinating multi-region deployment sequencing to maintain service availability during global rollouts.
Versioning and tracking infrastructure-as-code changes in alignment with application release cycles to prevent configuration drift.

Module 3: Optimizing Infrastructure Provisioning and Management

Pre-provisioning and reusing infrastructure pools (e.g., Kubernetes clusters, VMs) to reduce provisioning latency during deployments.
Implementing infrastructure drift detection and automated remediation without disrupting running workloads.
Choosing between serverless, containers, and VMs based on cold start requirements, cost, and operational overhead.
Designing modular, reusable Terraform or Pulumi modules to reduce provisioning time and increase consistency.
Integrating infrastructure provisioning into CI/CD pipelines while enforcing role-based access and change approval workflows.
Managing secrets injection at provisioning time using secure, auditable mechanisms like HashiCorp Vault or cloud-native secret managers.

Module 4: Enhancing Monitoring, Observability, and Feedback Loops

Instrumenting services with structured logging and distributed tracing to reduce mean time to diagnose (MTTD) in production.
Setting dynamic alert thresholds based on historical performance patterns to reduce noise and false positives.
Correlating deployment events with performance metrics to automatically detect regressions in production.
Reducing observability data volume through intelligent sampling and retention policies without losing diagnostic value.
Integrating observability data into developer workflows via pull request annotations and CI status checks.
Standardizing metric and log schemas across services to enable cross-team troubleshooting and automation.

Module 5: Governing Security and Compliance in High-Velocity Pipelines

Embedding SAST and dependency scanning into pull request pipelines without increasing feedback latency beyond developer tolerance.
Automating compliance checks (e.g., CIS benchmarks, policy-as-code) in pre-production environments with clear failure criteria.
Managing secrets rotation and revocation workflows in automated pipelines without disrupting active deployments.
Implementing least-privilege access for CI/CD service accounts across cloud and on-prem environments.
Enabling security exception workflows with audit trails for time-bound overrides during critical releases.
Integrating vulnerability databases with real-time feed updates to ensure scanning tools reflect current threat intelligence.

Module 6: Orchestrating Cross-Team Collaboration and Dependency Management

Mapping service ownership and deployment dependencies using a service catalog to reduce coordination overhead.
Establishing contract testing between teams to enable independent deployment of interdependent services.
Managing API versioning and deprecation timelines to prevent breaking changes in fast-moving ecosystems.
Implementing feature flags with targeted rollouts to decouple deployment from release without increasing technical debt.
Coordinating release trains across multiple teams while allowing opt-in/opt-out based on readiness.
Resolving environment contention by implementing ephemeral environments tied to feature branches.

Module 7: Driving Continuous Improvement Through Metrics and Feedback

Defining and tracking lead time for changes, deployment frequency, and change failure rate as core DevOps KPIs.
Automating root cause analysis of failed deployments using linked data from CI, monitoring, and version control.
Conducting blameless postmortems with action item tracking to close feedback loops on systemic issues.
Using A/B testing infrastructure to validate performance and user impact before full rollout.
Calibrating feedback mechanisms (e.g., developer surveys, tool usage telemetry) to identify process bottlenecks.
Iterating on pipeline design based on performance profiling of each stage (e.g., test duration, provisioning time).

Module 8: Scaling DevOps Practices Across Large and Distributed Organizations

Standardizing toolchains and interfaces across business units while allowing controlled customization for specific needs.
Implementing centralized observability and audit logging without creating single points of failure or latency.
Designing self-service platforms that reduce dependency on central DevOps teams for routine operations.
Managing global configuration consistency using hierarchical configuration management (e.g., Kustomize overlays, Helm values).
Enforcing platform guardrails through policy engines (e.g., OPA, Sentinel) without stifling innovation.
Synchronizing time zones and release calendars across geographically distributed teams to maintain coordination.