This curriculum spans the technical and operational rigor of a multi-workshop program, addressing the same service level challenges faced during large-scale event deployments, from pre-event planning and vendor coordination to on-site incident response and post-event compliance audits.
Module 1: Defining Event Service Level Objectives (SLOs)
- Selecting appropriate SLOs for ticketing system availability during peak registration windows, balancing user expectations with infrastructure capacity.
- Establishing measurable thresholds for mobile app response time during live event check-in, based on historical load testing data.
- Deciding between uptime percentage and transaction success rate as the primary SLO for payment processing during high-volume sales periods.
- Negotiating SLOs with third-party vendors for badge printing services, including acceptable turnaround time and error rate tolerances.
- Determining recovery time objectives (RTO) for streaming platforms after broadcast interruptions during virtual keynotes.
- Setting SLOs for Wi-Fi network performance in high-density attendee areas, factoring in device-per-person estimates and bandwidth allocation.
Module 2: Event Infrastructure Monitoring and Observability
- Deploying distributed tracing across microservices handling registration, scheduling, and session tracking to isolate latency bottlenecks.
- Configuring synthetic transaction monitoring for critical user journeys, such as group registration and agenda syncing.
- Integrating monitoring tools with on-site networking equipment to detect rogue access points or bandwidth saturation during multi-day events.
- Implementing log aggregation from mobile event apps, backend APIs, and kiosk systems to create a unified observability dashboard.
- Setting dynamic alert thresholds for API error rates during phased event rollouts to avoid alert fatigue during expected traffic spikes.
- Mapping monitoring coverage across hybrid environments, including cloud-hosted services and temporary on-premise infrastructure.
Module 3: Incident Response and On-Site Coordination
- Establishing escalation paths between venue IT staff, cloud operations teams, and third-party AV providers during service degradation.
- Pre-defining communication protocols for incident status updates to event organizers when SLO breaches occur.
- Conducting tabletop exercises with cross-functional teams to simulate failure scenarios, such as registration database outages.
- Deploying portable failover networks at critical access points when primary Wi-Fi fails during keynote sessions.
- Assigning dedicated incident commanders for different service domains (e.g., networking, registration, streaming) during large-scale events.
- Documenting post-incident timelines to identify gaps in detection, response, and resolution during post-mortem analysis.
Module 4: Vendor and Third-Party Service Integration
- Enforcing SLA compliance through contractual clauses with AV integrators, including penalties for broadcast latency exceeding 5 seconds.
- Validating API rate limits and retry mechanisms when integrating third-party gamification platforms into the event app.
- Requiring monitoring access from vendors providing RFID tracking systems to ensure end-to-end visibility of attendee movement data.
- Assessing the impact of vendor-specific downtime windows on overall event SLOs, particularly during concurrent sessions.
- Implementing circuit breakers in integrations with external registration partners to prevent cascading failures.
- Conducting pre-event security and performance audits of vendor-hosted services that handle attendee PII.
Module 5: Capacity Planning and Load Testing
- Simulating concurrent user loads on the session reservation system based on peak attendance projections for popular breakout sessions.
- Adjusting auto-scaling policies for cloud-hosted services based on load test results from previous year’s event data.
- Staging load tests during off-peak business hours to avoid impacting production systems used for ongoing event planning.
- Validating database connection pool sizing under stress conditions to prevent exhaustion during flash registration periods.
- Coordinating with venue facilities to ensure power and cooling capacity align with temporary server and networking deployments.
- Testing failover of content delivery network (CDN) configurations to ensure streaming continuity during regional outages.
Module 6: Event-Specific Change and Configuration Management
- Implementing a freeze on non-critical configuration changes 72 hours prior to event kickoff to reduce risk of unintended disruptions.
- Using feature flags to enable or disable real-time polling and Q&A functions during sessions based on system performance.
- Version-controlling all infrastructure-as-code templates used to deploy temporary event environments to ensure repeatability.
- Validating DNS and routing changes for event-specific domains before redirecting live traffic from registration portals.
- Documenting rollback procedures for mobile app updates deployed during multi-day events when new features introduce instability.
- Coordinating configuration updates across time zones when managing global virtual event platforms with regional data centers.
Module 7: Post-Event Analysis and Continuous Improvement
- Correlating SLO performance data with attendee feedback to identify service gaps that did not trigger technical alerts.
- Archiving monitoring data and incident logs for compliance review and future forensic analysis.
- Calculating burn rates for error budgets during the event to assess operational risk exposure and team responsiveness.
- Updating runbooks based on observed failure patterns, such as recurring delays in badge re-printing processes.
- Revising SLOs for the next event cycle based on actual performance trends and changing business requirements.
- Conducting blameless retrospectives with technical and event operations teams to refine cross-functional workflows.
Module 8: Regulatory Compliance and Data Residency in Global Events
- Mapping data flows across event systems to ensure GDPR compliance when collecting consent during session sign-ups in EU venues.
- Configuring data residency settings in cloud platforms to store attendee information within jurisdictional boundaries.
- Implementing audit logging for access to sensitive attendee data, such as dietary restrictions or accessibility requirements.
- Validating encryption standards for data in transit between on-site kiosks and central databases during hybrid events.
- Assessing vendor compliance with local privacy laws when using regional registration partners in APAC or LATAM markets.
- Designing data retention and deletion workflows aligned with legal requirements post-event, including backups and archives.