Skip to main content

Phylogenetic Analysis in Bioinformatics - From Data to Discovery

$299.00
Toolkit Included:
Includes a practical, ready-to-use toolkit containing implementation templates, worksheets, checklists, and decision-support materials used to accelerate real-world application and reduce setup time.
Who trusts this:
Trusted by professionals in 160+ countries
How you learn:
Self-paced • Lifetime updates
When you get access:
Course access is prepared after purchase and delivered via email
Your guarantee:
30-day money-back guarantee — no questions asked
Adding to cart… The item has been added

This curriculum spans the full workflow of a multi-investigator phylogenomics project, comparable in scope to an internal bioinformatics capability program that supports study design, data curation, model-based inference, and reproducible workflow automation across distributed research teams.

Module 1: Study Design and Data Acquisition in Phylogenomics

  • Select appropriate sequencing strategies (e.g., whole-genome, targeted capture, transcriptome) based on taxon sampling and evolutionary divergence.
  • Determine inclusion criteria for operational taxonomic units (OTUs) to balance phylogenetic breadth with data quality.
  • Evaluate trade-offs between sequencing depth and number of taxa when constrained by budget and computational resources.
  • Establish metadata standards for sample provenance, sequencing platform, and library preparation to ensure reproducibility.
  • Assess contamination risks in environmental or ancient DNA samples prior to alignment and orthology detection.
  • Implement data versioning and access protocols for multi-investigator collaborations involving distributed datasets.
  • Navigate ethical and legal considerations for using sequence data from protected species or indigenous biota.

Module 2: Sequence Alignment and Orthology Inference

  • Choose between de novo and reference-based assembly methods for non-model organisms with limited genomic resources.
  • Configure alignment parameters in MAFFT or MUSCLE to balance speed and accuracy for large multi-sequence datasets.
  • Apply domain-aware masking (e.g., using HMMER) to remove low-complexity or non-homologous regions from alignments.
  • Compare orthology inference methods (e.g., OrthoFinder, OrthoMCL) based on scalability and sensitivity for gene family clustering.
  • Resolve paralogy through gene tree-species tree reconciliation when constructing species-level phylogenies.
  • Integrate synteny information to validate ortholog calls in closely related species with recent duplications.
  • Document alignment curation steps to maintain auditability in regulatory or publication contexts.

Module 3: Alignment Curation and Data Filtering

  • Apply Gblocks or BMGE to remove ambiguously aligned regions while preserving phylogenetically informative sites.
  • Quantify and filter alignment positions with excessive missing data per taxon to avoid topological artifacts.
  • Assess compositional heterogeneity across taxa using Chi-square or posterior predictive checks in PhyloBayes.
  • Decide on inclusion/exclusion of fast-evolving sites based on site-rate heterogeneity models.
  • Implement partition schemes based on gene, codon position, or functional domain prior to model selection.
  • Use RogueNaRok to identify unstable taxa that degrade tree resolution and support values.
  • Balance data retention with signal-to-noise ratio when filtering low-information partitions.

Module 4: Substitution Model Selection and Partitioning

  • Run ModelFinder or jModelTest2 to identify best-fit nucleotide or amino acid substitution models per partition.
  • Compare BIC, AIC, and AICc for model selection under different dataset sizes and parameter counts.
  • Decide between linked and unlinked branch length models across partitions based on empirical fit.
  • Test for site-rate heterogeneity using gamma distributions or invariant sites models.
  • Justify use of codon models versus amino acid models for detecting selection in protein-coding sequences.
  • Validate model adequacy using posterior predictive simulations in Bayesian frameworks.
  • Document model decisions for peer review and reproducibility in collaborative phylogenies.

Module 5: Phylogenetic Inference Using Maximum Likelihood and Bayesian Methods

  • Configure RAxML-NG or IQ-TREE for parallel execution on high-performance computing clusters.
  • Set bootstrap replicates and thoroughness criteria to achieve convergence in support values.
  • Monitor MCMC chain convergence in MrBayes or PhyloBayes using ESS values and trace plots.
  • Adjust heating parameters and chain length in Bayesian analyses to avoid trapping in local optima.
  • Compare topology outputs from ML and Bayesian methods to assess robustness under different assumptions.
  • Manage memory and runtime constraints when analyzing large supermatrices (>10,000 sites, >100 taxa).
  • Implement checkpointing and job resubmission workflows for long-running tree searches.

Module 6: Species Tree Estimation and Gene Tree Discordance

  • Apply ASTRAL or ASTRID to infer species trees from gene trees while accounting for incomplete lineage sorting.
  • Quantify gene tree discordance using internode certainty or quartet similarity measures.
  • Diagnose sources of discordance (e.g., ILS, hybridization, HGT) using PhyParts or DiscoVista.
  • Integrate coalescent-based methods when population-level sampling is available within species.
  • Assess impact of gene tree estimation error on species tree accuracy using simulation benchmarks.
  • Use D-statistics (ABBA-BABA) to test for introgression between non-sister lineages.
  • Balance computational cost with model realism when choosing between concatenation and coalescent frameworks.

Module 7: Phylogenetic Comparative Methods and Trait Evolution

  • Reconstruct ancestral states for discrete traits using stochastic mapping in phytools or corHMM.
  • Fit evolutionary models (Brownian, OU, early burst) to continuous traits using maximum likelihood.
  • Test for phylogenetic signal using Blomberg’s K or Pagel’s λ across different clades.
  • Control for phylogenetic non-independence in regression models using PGLS.
  • Identify shifts in evolutionary rates using BAMM or l1ou, with careful prior specification.
  • Validate model assumptions (e.g., normality of residuals, tree calibration) before inference.
  • Interpret clade-specific results in light of fossil calibration and biogeographic context.

Module 8: Visualization, Annotation, and Data Sharing

  • Generate publication-quality tree figures using ggtree or ITOL with consistent color and label schemes.
  • Annotate trees with metadata (e.g., geography, phenotype, divergence times) for exploratory analysis.
  • Export trees in NHX or NeXML formats to preserve annotations and support interoperability.
  • Submit final phylogenies and alignments to curated repositories (e.g., TreeBASE, GenBank) with MIAPA compliance.
  • Use FigTree or Dendroscope for interactive exploration of large or complex topologies.
  • Version-control tree files and analysis scripts using Git or similar systems for audit trails.
  • Design scalable visualization strategies for consensus networks or phylogenetic placement results.

Module 9: Scalability, Reproducibility, and Workflow Automation

  • Containerize analysis pipelines using Docker or Singularity for environment consistency.
  • Orchestrate multi-step workflows using Snakemake or Nextflow with error handling and logging.
  • Optimize job scheduling on HPC systems using SLURM or PBS for memory and CPU-intensive steps.
  • Implement checksums and data integrity checks for large alignment and tree files.
  • Design modular pipeline components to enable reuse across projects with different taxon sets.
  • Integrate continuous integration testing for pipeline updates using synthetic or benchmark datasets.
  • Archive intermediate results and logs to support audit, debugging, and reanalysis.