Skip to main content

Next Generation Sequencing in Bioinformatics - From Data to Discovery

$299.00
Who trusts this:
Trusted by professionals in 160+ countries
How you learn:
Self-paced • Lifetime updates
When you get access:
Course access is prepared after purchase and delivered via email
Toolkit Included:
Includes a practical, ready-to-use toolkit containing implementation templates, worksheets, checklists, and decision-support materials used to accelerate real-world application and reduce setup time.
Your guarantee:
30-day money-back guarantee — no questions asked
Adding to cart… The item has been added

This curriculum spans the full lifecycle of NGS data analysis, comparable in scope to a multi-phase bioinformatics capability program implemented across research and clinical sequencing teams, covering experimental design through cloud-scale deployment and governance.

Module 1: NGS Platform Selection and Experimental Design

  • Selecting between Illumina, PacBio, and Oxford Nanopore based on required read length, error profile, and throughput for targeted versus whole-genome applications.
  • Determining optimal sequencing depth for variant calling in tumor-normal paired samples, balancing sensitivity and cost.
  • Designing multiplexing strategies using dual indexing to minimize cross-contamination and index hopping in high-throughput runs.
  • Choosing between whole-exome and whole-genome sequencing based on clinical validity, coverage uniformity, and downstream analytical burden.
  • Integrating spike-in controls and technical replicates to assess batch effects and library preparation variability.
  • Aligning experimental goals with institutional IRB requirements, particularly when handling germline versus somatic variants.
  • Planning for data storage and transfer bottlenecks during high-volume sequencing runs, especially with long-read platforms.
  • Specifying RNA-seq library preparation protocols (e.g., poly-A selection vs. rRNA depletion) based on sample integrity and transcriptome complexity.

Module 2: Raw Data Quality Control and Preprocessing

  • Interpreting FastQC reports to detect adapter contamination, overrepresented sequences, and per-base quality degradation.
  • Implementing Trimmomatic or Cutadapt with platform-specific parameters to remove adapters while preserving informative reads.
  • Deciding whether to apply quality-based trimming or hard clipping based on downstream variant sensitivity requirements.
  • Filtering low-complexity reads in AT/GC-rich genomes to prevent alignment artifacts in repetitive regions.
  • Assessing and correcting for PCR duplication rates using UMI-aware deduplication in amplicon-based panels.
  • Validating FASTQ integrity using checksums and metadata logging before initiating large-scale processing pipelines.
  • Handling mixed read lengths from degraded clinical samples in RNA-seq without introducing bias during trimming.
  • Configuring parallelized QC workflows using Snakemake or Nextflow to manage compute load across clusters.

Module 3: Read Alignment and Reference Genome Management

  • Selecting aligners (BWA-MEM, STAR, minimap2) based on data type (short-read, long-read, spliced RNA-seq).
  • Choosing between linear and graph-based reference genomes (e.g., GRCh38 vs. PGGB) to improve alignment in structurally variable regions.
  • Managing reference genome versioning across teams to prevent reproducibility issues in longitudinal studies.
  • Indexing custom reference genomes with decoy sequences to reduce false alignments in HLA or KIR regions.
  • Optimizing alignment parameters for high-identity paralogous genes to minimize mis-mapping in disease-associated loci.
  • Validating alignment performance using spike-in controls or synthetic reads with known variants.
  • Handling multimapping reads in repetitive regions during ChIP-seq or methylation analysis with probabilistic assignment.
  • Precomputing alignment indices on shared storage to reduce redundant compute in multi-user environments.

Module 4: Variant Calling and Genotype Refinement

  • Configuring GATK HaplotypeCaller for germline SNVs/indels with cohort-based recalibration in population studies.
  • Selecting between MuTect2, VarScan2, and Strelka2 for somatic variant calling based on tumor purity and ploidy assumptions.
  • Applying joint calling across cohorts to improve genotype consistency while managing computational scaling.
  • Filtering false positives in low-coverage regions using depth, strand bias, and mapping quality thresholds.
  • Integrating local reassembly to resolve complex indels in homopolymer regions from long-read data.
  • Validating CNV calls from exome data using off-target read depth and B-allele frequency from SNP arrays.
  • Adjusting VQSR (Variant Quality Score Recalibration) training sets when working with non-European populations.
  • Implementing ensemble calling strategies with cross-tool consensus to increase precision in clinical reporting.

Module 5: Functional Annotation and Interpretation

  • Selecting annotation databases (ClinVar, gnomAD, COSMIC, dbSNP) based on clinical actionability and population relevance.
  • Resolving conflicting interpretations of VUS (Variants of Uncertain Significance) using ACMG/AMP guidelines in diagnostic settings.
  • Integrating tissue-specific expression data from GTEx to prioritize non-coding variants in regulatory regions.
  • Using CADD, REVEL, and SIFT scores to rank missense variants when functional assays are unavailable.
  • Mapping splicing variants using SpliceAI or MaxEntScan to predict impact on canonical and cryptic splice sites.
  • Filtering population-specific polymorphisms using local allele frequency databases to reduce false positives.
  • Automating annotation pipelines with VEP or ANNOVAR while maintaining audit trails for clinical reporting.
  • Handling novel gene-disease associations in research contexts without overinterpreting preliminary evidence.

Module 6: Transcriptomic and Epigenomic Analysis

  • Normalizing RNA-seq count data using TPM or DESeq2 methods depending on comparison scope (within vs. across samples).
  • Correcting for batch effects in large expression datasets using ComBat or RUV without removing biological signal.
  • Selecting differential expression tools (edgeR, limma-voom) based on count distribution and experimental design.
  • Validating alternative splicing events with junction-spanning reads in STAR or rMATS outputs.
  • Integrating ATAC-seq and ChIP-seq peaks with promoter/enhancer annotations to infer regulatory networks.
  • Calling methylation levels from bisulfite sequencing with Bismark or BSMAP, correcting for incomplete conversion.
  • Clustering single-cell RNA-seq data using Seurat or Scanpy while preserving batch-corrected biological variation.
  • Defining pseudotime trajectories in developmental datasets with Monocle3 or PAGA, validating with known markers.

Module 7: Data Integration and Multi-Omics Workflows

  • Mapping genomic variants to expression QTLs using GTEx or eQTL Catalogue to identify regulatory mechanisms.
  • Performing pathway enrichment analysis with GSEA or Enrichr while correcting for gene set size and overlap.
  • Integrating copy number, mutation, and expression data in cancer samples to identify driver events.
  • Using WGCNA to construct co-expression networks and link modules to clinical phenotypes.
  • Aligning proteomic abundance data with transcript levels to assess post-transcriptional regulation.
  • Implementing MOFA+ for unsupervised integration of heterogeneous omics layers in cohort studies.
  • Resolving discordance between DNA methylation and gene expression in imprinted regions.
  • Managing data harmonization across platforms (e.g., microarray vs. RNA-seq) in meta-analyses.

Module 8: Data Governance, Reproducibility, and Compliance

  • Implementing audit trails for variant calling pipelines using version-controlled CWL or WDL workflows.
  • Enforcing data access controls in multi-institutional collaborations using DUOS or dbGaP compliance checks.
  • Applying de-identification protocols for genomic data under HIPAA and GDPR, including removal of quasi-identifiers.
  • Archiving raw and processed data in compliant repositories (e.g., SRA, EGA) with metadata in MINSEQE format.
  • Documenting pipeline parameters and software versions using RO-Crate or similar standards for reproducibility.
  • Establishing change control procedures for updating reference genomes or annotation databases in production.
  • Conducting periodic data integrity checks using checksum validation across storage tiers.
  • Designing disaster recovery plans for high-value genomic datasets with geographically distributed backups.

Module 9: Scalable Infrastructure and Cloud Deployment

  • Provisioning compute clusters with appropriate CPU, memory, and I/O profiles for alignment versus variant calling stages.
  • Migrating legacy pipelines to containerized environments (Docker, Singularity) for cross-platform consistency.
  • Configuring cloud bursting strategies using AWS Batch or Google Life Sciences API during peak loads.
  • Optimizing storage costs by tiering raw FASTQs to cold storage and retaining CRAMs for active analysis.
  • Implementing role-based access and encryption for data in transit and at rest on cloud platforms.
  • Monitoring pipeline performance with Prometheus and Grafana to detect bottlenecks in distributed systems.
  • Selecting between managed services (e.g., Terra, DNAnexus) and self-hosted solutions based on customization needs.
  • Estimating egress costs and transfer times when sharing large datasets across international collaborators.