Skip to main content

Gene Fusion in Bioinformatics - From Data to Discovery

$299.00
Your guarantee:
30-day money-back guarantee — no questions asked
Who trusts this:
Trusted by professionals in 160+ countries
When you get access:
Course access is prepared after purchase and delivered via email
Toolkit Included:
Includes a practical, ready-to-use toolkit containing implementation templates, worksheets, checklists, and decision-support materials used to accelerate real-world application and reduce setup time.
How you learn:
Self-paced • Lifetime updates
Adding to cart… The item has been added

This curriculum spans the full bioinformatics workflow for gene fusion analysis, comparable in scope to a multi-phase internal capability program for clinical genomics, covering experimental design, detection, validation, and deployment in production-grade, regulated environments.

Module 1: Foundations of Gene Fusion Biology and Clinical Relevance

  • Select appropriate reference genomes (e.g., GRCh38 vs. GRCh37) based on alignment compatibility with fusion detection tools and availability of clinically annotated fusion databases.
  • Evaluate tissue-specific expression patterns to distinguish driver fusions from passenger events in oncogenic contexts.
  • Assess the impact of fusion breakpoints on protein domains using domain databases like Pfam or InterPro to predict functional consequences.
  • Integrate knowledge of known oncogenic fusions (e.g., BCR-ABL1, EML4-ALK) into assay design for targeted sequencing panels.
  • Determine whether to include intronic regions in sequencing capture design based on known fusion breakpoint distribution in target genes.
  • Map fusion events to clinical actionability using resources like OncoKB or CGI to prioritize variants for reporting.
  • Establish thresholds for fusion expression levels below which biological relevance is questionable in RNA-seq data.
  • Classify fusions by mechanism (e.g., translocation, read-through, retrotransposition) to inform downstream validation strategies.

Module 2: Experimental Design and Sequencing Methodologies

  • Choose between whole-transcriptome RNA-seq and targeted panel sequencing based on sample input, cost constraints, and required sensitivity for low-expression fusions.
  • Optimize RNA integrity number (RIN) thresholds for sample inclusion, particularly in FFPE-derived samples with degraded RNA.
  • Decide on stranded vs. non-stranded library preparation based on ability to resolve antisense transcription and fusion orientation.
  • Set read length and depth requirements (e.g., ≥50M paired-end 150bp reads) to ensure sufficient spanning and split-read support for fusion detection.
  • Implement unique molecular identifiers (UMIs) in library prep to mitigate PCR duplication artifacts in low-input samples.
  • Select between poly-A selection and rRNA depletion based on sample type and potential for non-polyadenylated fusion transcripts.
  • Design hybridization probes for targeted panels to maximize coverage of known intronic breakpoint hotspots in fusion-prone genes.
  • Include positive control cell lines (e.g., K-562 for BCR-ABL1) in sequencing runs to monitor assay performance.

Module 3: Preprocessing and Quality Control of NGS Data

  • Apply adapter trimming and quality filtering using tools like Trimmomatic or fastp with parameters tuned for RNA-seq data.
  • Assess sequencing saturation and duplication rates to determine if UMI-based deduplication is necessary.
  • Monitor batch effects across sequencing runs using PCA on gene expression profiles before fusion calling.
  • Exclude samples with high ribosomal RNA content post-rRNA depletion from downstream analysis.
  • Validate strand specificity using RSeQC to confirm library preparation fidelity.
  • Correct for GC bias in coverage, particularly in regions flanking fusion breakpoints, using normalization methods.
  • Align reads to both genomic and transcriptomic references to support split-read and discordant-pair detection.
  • Flag samples with low mappability due to high sequence divergence or contamination using Kraken or FastQ Screen.

Module 4: Fusion Detection Algorithms and Tool Integration

  • Run multiple fusion callers (e.g., STAR-Fusion, Arriba, FusionCatcher) in parallel to increase sensitivity and reduce false negatives.
  • Configure STAR aligner with outFilterMultimapScoreRange and alignSJoverhangMin to optimize splice junction detection for fusion calling.
  • Adjust minimum supporting read thresholds (e.g., ≥2 spanning reads, ≥1 split read) based on sequencing depth and background noise.
  • Filter fusions involving pseudogenes or paralogs using sequence homology databases to reduce false positives.
  • Integrate results from DNA-based structural variant callers (e.g., Manta) to validate RNA-observed fusions at the genomic level.
  • Exclude fusions with alignment artifacts caused by homopolymer regions or low-complexity sequences.
  • Use annotation databases (e.g., COSMIC, ChimerDB) to prioritize known pathogenic fusions during result filtering.
  • Implement a tiered classification system (Tier I–IV) for fusions based on evidence level and clinical relevance.

Module 5: Annotation and Functional Interpretation of Fusions

  • Map fusion breakpoints to exon-intron boundaries to predict in-frame vs. out-of-frame transcripts using RefSeq or Ensembl annotations.
  • Determine whether the 5’ and 3’ partner genes retain functional domains post-fusion using protein domain databases.
  • Assess promoter swapping potential by analyzing expression levels of the 5’ partner gene in normal tissues.
  • Annotate kinase domain retention in fusion proteins to evaluate druggability (e.g., in ALK, ROS1, NTRK fusions).
  • Use gene ontology and pathway analysis (e.g., Reactome, KEGG) to infer disrupted biological processes.
  • Integrate expression data to determine if the fusion transcript is expressed at biologically relevant levels.
  • Flag fusions involving tumor suppressor genes where truncation may lead to loss of function.
  • Compare fusion isoforms across databases to identify novel splice variants with potential functional impact.

Module 6: Validation and Clinical Reporting

  • Select orthogonal validation method (RT-PCR, Sanger sequencing, or FISH) based on fusion architecture and available sample material.
  • Design PCR primers spanning the fusion junction with Tm balancing and specificity checks against reference genome.
  • Establish minimum validation thresholds (e.g., ≥50% concordance across replicates) for reporting in clinical contexts.
  • Document bioinformatics pipeline versioning, parameters, and reference databases used for audit and reproducibility.
  • Define reporting thresholds for variant allele frequency and read support in clinical-grade fusion reports.
  • Include confidence levels (e.g., confirmed, probable, artifact) in reports based on supporting evidence tiers.
  • Redact incidental findings unrelated to the clinical indication unless they meet ACMG secondary findings criteria.
  • Implement structured reporting using standardized vocabularies (e.g., HGVS nomenclature, HUGO gene symbols).

Module 7: Data Integration and Multi-Omics Context

  • Correlate fusion status with copy number alterations (e.g., MYC amplification in fusion-positive cancers) using joint analysis.
  • Assess mutational burden and co-occurring SNVs/indels to determine if the fusion is part of a broader mutational signature.
  • Integrate methylation data to evaluate epigenetic silencing of the non-fused allele in tumor suppressor gene fusions.
  • Overlay fusion data with protein expression (e.g., RPPA or IHC) to confirm translation of fusion transcripts.
  • Use single-cell RNA-seq to resolve fusion heterogeneity within tumor subclones.
  • Compare fusion expression across tumor and normal compartments in spatial transcriptomics datasets.
  • Link fusion events to immune microenvironment profiles (e.g., T-cell infiltration) for immunotherapy relevance.
  • Aggregate fusion data with clinical outcomes in retrospective cohorts to assess prognostic significance.

Module 8: Governance, Reproducibility, and Regulatory Compliance

  • Implement containerization (e.g., Docker/Singularity) to ensure pipeline portability and version control.
  • Adopt workflow languages (e.g., Nextflow, Snakemake) to standardize and document analysis pipelines.
  • Establish audit trails for all data processing steps using metadata tracking systems (e.g., LIMS).
  • Define data retention policies for raw and processed files in compliance with CLIA or HIPAA requirements.
  • Conduct periodic bioinformatics pipeline validation using reference datasets (e.g., SEQC-2) to maintain accuracy.
  • Restrict access to sensitive genomic data using role-based access control and encryption at rest.
  • Document deviations from standard operating procedures during troubleshooting for regulatory review.
  • Participate in external quality assessment (EQA) programs for molecular pathology to benchmark fusion detection performance.

Module 9: Scalability and Deployment in Production Environments

  • Design cloud-based analysis pipelines with autoscaling to handle variable sequencing batch loads.
  • Optimize I/O operations for large BAM files using parallel processing and distributed file systems.
  • Implement automated failure recovery for long-running fusion detection workflows.
  • Cache frequently accessed reference data (e.g., genome indices, annotation files) to reduce latency.
  • Monitor compute costs per sample and optimize resource allocation (CPU, memory) per tool.
  • Develop APIs to integrate fusion calling results into laboratory information systems (LIS).
  • Enable real-time status tracking for samples moving through the analysis pipeline.
  • Support multi-institutional data sharing using federated analysis frameworks while preserving data locality.