Description

This curriculum spans the full analytical lifecycle of DNA methylation studies, comparable in scope to a multi-phase bioinformatics consulting engagement supporting epigenetic discovery projects from experimental design through data sharing.

Module 1: Fundamentals of DNA Methylation Biology and Epigenetic Mechanisms

Select appropriate CpG island definitions based on genomic context (e.g., promoter vs. intergenic regions) when annotating methylation sites.
Determine the biological relevance of 5-methylcytosine (5mC) versus 5-hydroxymethylcytosine (5hmC) in tissue-specific gene regulation.
Evaluate the impact of methylation at different genomic elements (promoters, enhancers, gene bodies) on transcriptional outcomes.
Assess the role of DNMT and TET enzyme families in dynamic methylation changes during cellular differentiation.
Integrate histone modification data to interpret bivalent chromatin states in stem cell and cancer epigenomes.
Decide when to include non-CpG methylation (CpA, CpT, CpC) in analyses based on cell type (e.g., neurons, embryonic cells).
Interpret allele-specific methylation in the context of genomic imprinting and X-chromosome inactivation.
Account for age-related methylation drift in longitudinal study designs and control selection.

Module 2: Experimental Design and Platform Selection for Methylation Profiling

Choose between bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS), and methylation arrays based on budget, coverage needs, and sample type.
Optimize input DNA quantity and quality thresholds for bisulfite conversion efficiency across degraded samples (e.g., FFPE).
Balance multiplexing capacity with per-sample depth when designing Illumina TruSeq Methyl Capture panels.
Implement spike-in controls (e.g., unmethylated lambda DNA) to monitor bisulfite conversion rates.
Design case-control or cohort studies with appropriate matching for age, sex, and cell composition.
Decide on batch processing strategies to minimize technical variation in multi-center studies.
Select between single-end and paired-end sequencing for RRBS based on insert size distribution and alignment accuracy.
Validate array-based findings (e.g., 450K/EPIC) with targeted bisulfite sequencing in follow-up experiments.

Module 3: Raw Data Preprocessing and Quality Control

Trim adapter sequences and low-quality bases from bisulfite-converted reads using tools like Trim Galore! with proper parameter tuning.
Assess bisulfite conversion efficiency by calculating C-to-T conversion rates in non-CpG contexts.
Filter out reads with poor alignment rates to bisulfite-converted reference genomes (e.g., bismark with Bowtie2).
Remove PCR duplicates using molecular barcodes (UMIs) or alignment-based methods depending on library prep.
Generate sample-level QC metrics (coverage depth, CpG coverage uniformity, mitochondrial read proportion) for outlier detection.
Compare beta-value distributions across samples to detect technical artifacts or batch effects.
Use control probes on methylation arrays to assess background signal and dye bias.
Apply gender checks using X/Y chromosome methylation patterns to verify sample identity.

Module 4: Alignment, Methylation Calling, and Data Normalization

Select alignment tools optimized for bisulfite data (e.g., Bismark, BSMAP) based on speed and sensitivity requirements.
Resolve ambiguous alignments in repetitive regions by adjusting seed length and mismatch tolerance.
Calculate beta and M-values from raw methylation counts, choosing appropriate metrics for downstream analysis.
Apply functional normalization (FunNorm) or BMIQ to correct for type I and type II probe bias in array data.
Use reference-based or reference-free methods (e.g., RefFreeEWAS) to adjust for cell type heterogeneity in whole blood samples.
Implement quantile normalization cautiously, preserving biological variation in heterogeneous tissue samples.
Handle missing methylation data using imputation methods (e.g., missMethyl) or exclusion based on missingness thresholds.
Integrate multiple batches using ComBat or SVA while preserving known biological covariates.

Module 5: Differential Methylation Analysis and Region-Based Detection

Choose between site-specific (e.g., limma, methylKit) and region-based (e.g., DSS, methylSig) methods based on study hypothesis.
Define differentially methylated positions (DMPs) using thresholds for delta-beta, p-value, and FDR-adjusted significance.
Aggregate adjacent CpGs into differentially methylated regions (DMRs) using sliding windows or clustering algorithms.
Adjust statistical models for confounding variables such as age, batch, and estimated cell proportions.
Validate DMRs using permutation testing to assess significance under null distribution.
Interpret directionality of methylation changes (hyper- vs. hypomethylation) in context of gene regulatory elements.
Compare effect sizes across genomic contexts to prioritize functionally relevant DMRs.
Apply region-set enrichment analysis (e.g., GSEA) to identify pathways enriched for methylation changes.

Module 6: Integration with Transcriptomic and Genomic Data

Perform cis-methylation and gene expression correlation using matched RNA-seq and methylation data from the same samples.
Identify methylation quantitative trait loci (meQTLs) by integrating SNP genotypes with methylation levels.
Overlay DMRs with chromatin accessibility (ATAC-seq) peaks to infer regulatory potential.
Use promoter methylation to stratify expression outliers in cancer samples (e.g., TCGA).
Assess concordance between methylation silencing and copy number loss in tumor suppressor genes.
Construct multi-omic interaction networks using tools like MOFA or iCluster.
Validate predicted regulatory relationships using public databases (e.g., ENCODE, Roadmap Epigenomics).
Resolve discordant signals (e.g., hypermethylation with increased expression) by considering alternative promoters or enhancers.

Module 7: Functional Annotation and Pathway Enrichment

Map DMRs to nearest genes while considering topologically associating domains (TADs) for distal regulatory effects.
Use GREAT or ChIP-Enrich to assign biological meaning to non-promoter DMRs based on regulatory domain models.
Perform gene ontology (GO) and KEGG pathway analysis with proper multiple testing correction.
Filter enriched terms based on specificity and avoid overinterpretation of broad categories (e.g., "cellular process").
Integrate transcription factor binding site (TFBS) databases (e.g., JASPAR) to identify potential regulatory drivers.
Assess enrichment of DMRs in known super-enhancers or disease-associated loci from GWAS.
Compare functional profiles across conditions (e.g., tumor vs. normal) to identify context-specific pathways.
Use tissue-specific regulatory annotations to prioritize findings in relevant biological systems.

Module 8: Methylation Clocks and Biomarker Development

Select epigenetic clock algorithms (Horvath, Hannum, PhenoAge) based on tissue type and phenotypic focus.
Calculate epigenetic age acceleration and interpret its association with disease or environmental exposures.
Validate clock performance in non-European populations to assess generalizability.
Develop custom biomarkers using elastic net or random forest models trained on methylation data.
Assess biomarker robustness across batches, platforms, and sample collection methods.
Define clinically actionable thresholds for methylation-based classifiers (e.g., cancer detection).
Estimate minimal sample size for biomarker validation using power calculations for AUC.
Implement cross-validation strategies to avoid overfitting in high-dimensional methylation datasets.

Module 9: Data Sharing, Reproducibility, and Ethical Considerations

Prepare methylation data for public deposition in GEO or dbGaP with complete metadata and experimental details.
Use standardized ontologies (e.g., OBI, EFO) to describe sample characteristics and protocols.
Document bioinformatics workflows using containers (Docker/Singularity) and workflow languages (Snakemake, Nextflow).
Archive intermediate files and version control scripts to ensure reproducibility.
Address privacy risks in methylation data due to potential identification from epigenetic signatures.
Implement data access controls for sensitive studies involving minors or stigmatized conditions.
Report sex and ancestry estimates derived from methylation arrays in compliance with ethical guidelines.
Disclose conflicts of interest when developing commercializable biomarkers or diagnostic tools.