Description

This curriculum spans the technical depth and operational rigor of a multi-workshop performance engineering engagement, addressing query optimization across the full data stack from single-query rewriting to cross-system distributed workloads.

Module 1: Foundations of Query Workload Analysis

Profile query execution frequency and runtime distribution across production workloads to identify high-impact candidates for optimization.
Classify queries by access patterns (e.g., point lookups, range scans, aggregations) to inform indexing and materialization strategies.
Differentiate between OLTP, OLAP, and hybrid query types when selecting optimization techniques and performance metrics.
Instrument query logs to capture execution plans, resource consumption, and user context without introducing latency.
Map queries to business processes to prioritize optimization efforts based on operational criticality.
Establish thresholds for query performance degradation that trigger automated review or alerting.
Normalize and parse SQL statements to detect recurring templates and parameterized variants.

Module 2: Index Design and Maintenance Strategy

Select candidate columns for composite indexes based on predicate selectivity and query filter frequency.
Balance index coverage against write amplification by measuring INSERT/UPDATE/DELETE overhead per added index.
Implement partial (filtered) indexes to reduce footprint for queries targeting specific data subsets.
Define automated index validation procedures to detect and remove stale or unused indexes.
Use index advisor tools with caution, validating recommendations against actual execution plans and concurrency patterns.
Plan index rebuilds during maintenance windows to avoid blocking active transactions.
Monitor index fragmentation levels and configure reorganization thresholds based on page split rates.

Module 3: Execution Plan Interpretation and Intervention

Identify inefficient operators such as table scans, key lookups, or spools in execution plans for targeted correction.
Differentiate between estimated and actual row counts to detect cardinality estimation errors.
Analyze plan regressions after statistics updates or schema changes using plan forcing or plan history tools.
Use query hints judiciously to override optimizer choices when evidence supports consistent plan degradation.
Compare serial vs. parallel execution paths and set cost thresholds to control parallelism at the query level.
Diagnose parameter sniffing issues by testing plan reuse with diverse parameter values.
Document plan anomalies with execution context (e.g., memory pressure, tempdb contention) for root cause analysis.

Module 4: Statistics Management and Cardinality Estimation

Configure automatic statistics updates based on row modification thresholds tailored to table volatility.
Supplement automatic updates with scheduled full-scan statistics on large, skewed tables where sampling fails.
Use filtered statistics to improve cardinality estimates for queries with common WHERE clause filters.
Monitor statistics age and freshness relative to data modification rates using system DMVs.
Test the impact of legacy vs. new cardinality estimators on complex multi-join queries during version upgrades.
Disable auto-update temporarily during bulk ETL loads to prevent mid-process plan changes.
Validate histogram accuracy by comparing estimated vs. actual row counts in critical query segments.

Module 5: Query Rewriting and Transformation Techniques

Replace correlated subqueries with equivalent JOINs or window functions to reduce iterative execution.
Rewrite OR conditions in WHERE clauses using UNION ALL when index usage is otherwise inhibited.
Flatten nested views to expose underlying predicates and enable better optimization.
Convert inefficient LIKE patterns with leading wildcards into full-text search where applicable.
Break up large DELETE/UPDATE statements into batches to reduce locking and transaction log pressure.
Materialize intermediate results in temporary tables to simplify complex query trees and improve plan stability.
Eliminate redundant calculations by moving expressions to computed columns with persisted storage.

Module 6: Materialized Structures and Precomputation

Design materialized views (indexed views) to precompute expensive aggregations and joins with schema binding.
Assess the trade-off between query speedup and increased storage and maintenance cost for each materialization.
Implement incremental refresh mechanisms for materialized structures in high-velocity environments.
Use summary tables for reporting workloads, aligning grain with common GROUP BY dimensions.
Enforce referential integrity on source tables to ensure indexed views remain eligible for optimizer use.
Monitor staleness of precomputed results in near-real-time systems and define acceptable latency SLAs.
Partition materialized structures to align with query filters and enable efficient purging.

Module 7: Concurrency and Resource Governance

Configure resource governor or workload groups to isolate high-priority queries from ad hoc traffic.
Set query cost thresholds to control parallelism and prevent runaway queries from consuming all CPU.
Implement query timeouts at the application and connection level to terminate unresponsive executions.
Use snapshot or read-committed snapshot isolation to reduce blocking in reporting queries.
Monitor lock escalation events and adjust query or indexing to prevent table-level locks.
Limit tempdb contention by reducing spool operators and sorting inappropriately large datasets.
Track memory grants and configure minimum/maximum memory per query to avoid resource monopolization.

Module 8: Monitoring, Automation, and Continuous Optimization

Deploy persistent query performance baselines to detect deviations during deployments or data growth.
Automate index and statistics recommendations using historical plan analysis and missing index DMVs.
Integrate query optimization into CI/CD pipelines by testing execution plans in pre-production environments.
Use extended events to capture long-running queries without the overhead of SQL Trace.
Build dashboards that correlate query performance with system metrics (I/O, CPU, memory).
Implement a change control process for query and schema modifications to track optimization impact.
Rotate and archive query performance data to maintain monitoring system efficiency over time.

Module 9: Cross-System Optimization in Distributed Environments

Push down filters and projections to source systems in federated queries to minimize data transfer.
Design sharding keys that align with common query predicates to reduce cross-node communication.
Cache reference data locally in distributed architectures to avoid repeated remote lookups.
Evaluate the cost of data serialization and network latency when joining remote datasets.
Use bulk data movement instead of row-by-row processing when synchronizing distributed sources.
Implement query routing logic to direct read workloads to replicas based on freshness requirements.
Monitor and limit the use of distributed transactions due to their impact on availability and latency.