This curriculum spans the technical and operational rigor of a multi-workshop serverless adoption program, addressing the same architectural trade-offs, security controls, and operational patterns encountered when redesigning enterprise workloads for event-driven, scale-to-zero environments.
Module 1: Defining Serverless Scope and Service Boundaries
- Selecting between Function-as-a-Service (e.g., AWS Lambda, Azure Functions) and Backend-as-a-Service (e.g., Firebase, Auth0) based on control requirements and integration complexity.
- Deciding on function granularity: balancing single-responsibility functions against invocation overhead and monitoring sprawl.
- Establishing ownership models for serverless components across distributed teams to prevent operational ambiguity.
- Defining which workloads are appropriate for serverless (event-driven, sporadic) versus those better suited for containers or VMs (long-running, predictable).
- Mapping legacy monolith capabilities to serverless functions while identifying data and state dependencies that impede decomposition.
- Setting thresholds for cold start tolerance based on user experience requirements and geographic distribution needs.
Module 2: Infrastructure as Code for Serverless Deployments
- Choosing between framework-based tooling (e.g., Serverless Framework, AWS SAM) and general-purpose IaC (e.g., Terraform, Pulumi) for managing function configurations.
- Designing versioned deployment pipelines that support atomic updates of function code and associated IAM roles.
- Managing environment-specific configurations (dev, staging, prod) without hardcoding or exposing secrets in source control.
- Implementing rollback strategies for failed deployments when serverless platforms lack native version rollback triggers.
- Enforcing tagging policies across functions to support cost allocation, compliance, and resource discovery.
- Automating dependency validation (e.g., correct runtime versions, layer compatibility) before deployment to prevent runtime failures.
Module 3: Identity, Permissions, and Least Privilege Enforcement
- Constructing IAM roles with minimal permissions for each function, avoiding wildcard policies even during development.
- Managing cross-account function invocations securely using role assumption and resource-based policies.
- Integrating short-lived credentials via OIDC or federated identity for functions accessing external SaaS APIs.
- Rotating and auditing access keys used by functions that interact with legacy systems lacking IAM integration.
- Implementing permission boundaries to constrain developer-deployed roles within organizational guardrails.
- Monitoring for privilege escalation attempts through CloudTrail or equivalent audit logs when functions modify policies.
Module 4: Observability and Distributed Tracing
- Correlating logs across fragmented function invocations using trace IDs propagated through event sources and APIs.
- Configuring structured logging formats to ensure compatibility with centralized log aggregation systems (e.g., ELK, Splunk).
- Instrumenting custom metrics for business-critical operations not captured by platform-native monitoring.
- Setting up distributed tracing across serverless and non-serverless components using OpenTelemetry or vendor SDKs.
- Filtering and sampling high-volume logs to control cost without losing diagnostic fidelity for error conditions.
- Diagnosing performance bottlenecks in chained function calls by analyzing inter-function latency and payload size.
Module 5: Event-Driven Design and Integration Patterns
- Selecting event sources (e.g., S3, SQS, EventBridge) based on delivery guarantees, throughput, and retry semantics.
- Designing idempotent functions to handle duplicate events from message queues or retry mechanisms.
- Implementing dead-letter queues (DLQs) for failed event processing with alerting and reprocessing workflows.
- Decoupling producers and consumers using event buses while managing schema evolution and backward compatibility.
- Throttling function concurrency to prevent downstream system overload during traffic spikes.
- Orchestrating complex workflows using step functions or state machines instead of chaining synchronous function calls.
Module 6: Security, Compliance, and Data Protection
- Encrypting function environment variables at rest and in transit using KMS or equivalent key management services.
- Scanning function packages for vulnerabilities and embedded secrets during CI/CD pipeline execution.
- Enforcing data residency requirements by restricting function deployment regions and data egress points.
- Implementing input validation and sanitization to prevent injection attacks via event payloads.
- Auditing function configuration changes using configuration drift detection tools and alerting on unauthorized modifications.
- Meeting compliance requirements (e.g., SOC 2, HIPAA) by documenting serverless control implementations and evidence collection processes.
Module 7: Performance Optimization and Cost Management
- Tuning function memory and timeout settings based on profiling data to balance performance and cost.
- Using provisioned concurrency to mitigate cold starts in latency-sensitive applications, weighing cost implications.
- Monitoring invocation patterns to identify and eliminate idle or underutilized functions.
- Optimizing package size by removing unused dependencies and leveraging layers for shared code.
- Forecasting and budgeting for variable serverless costs based on usage trends and scaling behavior.
- Implementing circuit breakers and bulkheads in function-to-function communication to prevent cascading failures under load.
Module 8: Disaster Recovery and Operational Resilience
- Designing multi-region failover strategies for critical serverless APIs using DNS routing and replicated event sources.
- Backing up function code, configuration, and environment variables to version-controlled repositories or artifact stores.
- Testing recovery procedures by simulating region outages and measuring RTO/RPO for serverless workloads.
- Managing dependencies on managed services (e.g., API Gateway, DynamoDB) that may not support cross-region replication by default.
- Documenting manual intervention steps for incidents involving platform-level outages beyond organizational control.
- Establishing incident response playbooks specific to serverless failures, including log access, tracing, and rollback procedures.