This curriculum spans the technical and analytical rigor of a multi-workshop bioinformatics program, equipping practitioners to build, analyze, and interpret biological networks using the same methodologies applied in internal omics capability teams and collaborative research pipelines.
Module 1: Foundations of Biological Networks and Graph Theory
- Select appropriate graph representations (directed, undirected, weighted, bipartite) based on molecular interaction types such as protein-protein, gene regulatory, or metabolic pathways.
- Implement adjacency matrix vs. edge list data structures considering memory efficiency and query performance for large-scale interactomes.
- Define node and edge semantics consistently across datasets to enable integration of heterogeneous sources like STRING, BioGRID, and KEGG.
- Resolve naming inconsistencies (e.g., gene symbols, isoforms) using authoritative identifiers such as Ensembl, UniProt, or HGNC during network construction.
- Evaluate the impact of self-loops and multi-edges in biological contexts, particularly in feedback regulation or alternative splicing interactions.
- Apply graph normalization techniques to correct for node degree bias arising from well-studied proteins or genes.
- Assess network density and sparsity to determine suitability for downstream analyses such as module detection or centrality scoring.
- Design metadata schemas to annotate network edges with evidence types, confidence scores, and experimental methods.
Module 2: Data Acquisition and Integration from Multi-Omics Sources
- Construct automated pipelines to retrieve and version-control public datasets from repositories such as GEO, TCGA, and PRIDE using API-based access.
- Map omics data (RNA-seq, ChIP-seq, phosphoproteomics) to network nodes using consistent identifier mapping with tools like biomaRt or BridgeDb.
- Integrate quantitative data into network edges or nodes, choosing between overlay methods (e.g., correlation, mutual information) or constraint-based approaches.
- Handle batch effects and platform-specific biases when combining data from different studies or technologies.
- Implement thresholding strategies for interaction inclusion based on statistical significance, fold change, or effect size.
- Balance comprehensiveness and reliability by combining high-throughput experimental data with curated interactions from literature.
- Use semantic web technologies (RDF, SPARQL) to query and integrate knowledge graphs like Wikidata or Open Targets.
- Document provenance and versioning of all integrated datasets to ensure reproducibility and auditability.
Module 4: Topological Analysis and Centrality Metrics
- Compute centrality measures (degree, betweenness, closeness, eigenvector) and interpret biological relevance in context-specific networks.
- Compare centrality rankings across conditions (e.g., disease vs. control) to identify context-specific hub genes or proteins.
- Adjust betweenness centrality calculations for disconnected components in sparse biological networks.
- Evaluate the stability of centrality rankings under edge perturbation or subsampling to assess robustness.
- Use randomization techniques (e.g., degree-preserving rewiring) to establish null distributions for centrality significance testing.
- Integrate functional annotations to determine whether topologically central nodes are enriched for disease associations or essentiality.
- Apply k-core decomposition to identify densely connected regions and assess their functional coherence.
- Compare centrality profiles across species to study evolutionary conservation of network architecture.
Module 5: Community Detection and Functional Module Identification
- Select community detection algorithms (Louvain, Infomap, Leiden) based on resolution requirements and network size.
- Tune resolution parameters to avoid over- or under-partitioning, particularly in hierarchical biological systems.
- Validate detected modules using functional enrichment analysis (GO, Reactome) to assess biological coherence.
- Compare module stability across multiple algorithm runs or subsampled networks to evaluate reproducibility.
- Integrate expression or perturbation data to prioritize modules associated with phenotypic outcomes.
- Map modules to known pathways and assess overlap versus novelty in disease contexts.
- Use consensus clustering to combine results from multiple algorithms and reduce method-specific bias.
- Track module dynamics across conditions (e.g., time series, drug response) to identify responsive subnetworks.
Module 6: Network Inference from High-Throughput Data
- Choose inference methods (GENIE3, ARACNe, CLR) based on data type (e.g., scRNA-seq vs. bulk) and regulatory assumptions.
- Preprocess expression data using normalization, filtering, and transformation appropriate for the inference algorithm.
- Control for confounding factors such as batch, cell cycle, or technical noise during network inference.
- Set significance thresholds using permutation testing or FDR correction to limit false-positive edges.
- Validate inferred networks against gold-standard interactomes or perturbation data (e.g., CRISPR screens).
- Assess scalability and memory usage when inferring networks from single-cell datasets with thousands of cells.
- Combine multiple inference methods using ensemble approaches to improve accuracy and robustness.
- Document parameter settings and random seeds to ensure reproducibility of inferred topologies.
Module 7: Dynamic and Temporal Network Modeling
- Construct time-series networks using sliding windows or state-specific data segmentation.
- Apply Granger causality or dynamic Bayesian networks to infer directional interactions from longitudinal data.
- Model network rewiring by comparing topological metrics across time points or disease stages.
- Incorporate delay parameters in edge inference to capture transcriptional or signaling lag.
- Use differential network analysis to detect significant edge gains or losses between conditions.
- Visualize temporal changes using animation or small multiples while maintaining node correspondence.
- Validate dynamic predictions with perturbation experiments or independent time-course datasets.
- Handle missing or irregularly sampled time points using interpolation or state-space modeling.
Module 8: Network-Based Biomarker and Drug Target Discovery
- Prioritize candidate biomarkers using network proximity to known disease modules or differentially expressed genes.
- Apply network diffusion methods (e.g., random walk with restart) to propagate disease signals from seed genes.
- Evaluate target druggability by overlaying network centrality with pharmacological data (e.g., ChEMBL, DrugBank).
- Assess polypharmacology risks by analyzing off-target connectivity in protein interaction networks.
- Use network resilience metrics (e.g., fragmentation after node removal) to predict essentiality of candidate targets.
- Integrate side effect profiles by mapping drug targets to shared network neighborhoods.
- Validate candidate targets using CRISPR/Cas9 knockout screens or siRNA datasets.
- Balance novelty and tractability when selecting targets from poorly characterized network regions.
Module 9: Visualization, Interpretation, and Reporting of Network Findings
- Select layout algorithms (force-directed, circular, hierarchical) based on network size and biological context.
- Apply edge bundling or filtering to reduce visual clutter in dense networks without losing critical connections.
- Encode biological attributes (expression, mutation status, subcellular localization) using color, size, and shape.
- Generate publication-ready figures with consistent styling, resolution, and annotation using tools like Cytoscape or Gephi.
- Implement interactive dashboards for exploratory analysis with filtering, search, and module highlighting.
- Produce summary reports that link topological findings to functional and clinical interpretations.
- Ensure accessibility by including text descriptions, legends, and alternative formats for colorblind users.
- Archive and share interactive network views using web-based platforms like NDEx or CyNetShare.