ISOFORMS: Genes with Clear Functional Differences Between Isoforms

Species: human, mouse, rat, DROME

ISOFORMS: Genes with Clear Functional Differences Between Isoforms

Overview

This project explores genes where alternative splicing produces isoforms with demonstrably different biological functions. These cases are particularly important for GO annotation because:

  1. Standard annotation often conflates isoform functions - a single gene gets annotated with functions that may only apply to specific isoforms
  2. Isoform-specific annotations are rare in GO - most annotations are made at the gene level
  3. Understanding isoform biology is critical for precision medicine and accurate functional characterization

Selection Criteria

Genes selected for this project should have:
- Well-characterized alternative splicing events
- Clear functional differences between isoforms (not just expression pattern differences)
- Preferably opposing or divergent functions (e.g., pro-apoptotic vs anti-apoptotic)
- Strong experimental evidence for isoform-specific activities

Priority Genes

Tier 1: Classic Paradigms (Start Here)

Tier 2: Developmental/Tissue-Specific

Tier 3: Additional Cases to Explore

Special Cases: Polyproteins (Post-Translational Cleavage)

These are NOT alternative splicing but share the annotation challenge of multiple functional products from one gene:

Background: Key Concepts

Types of Functional Differences

  1. Antagonistic functions - Bcl-xL vs Bcl-xS, membrane vs soluble Fas
  2. Tissue-specific activity - Agrin Z+ (neuronal) vs Z- (muscle)
  3. Localization differences - VEGF isoforms (diffusible vs matrix-bound)
  4. Ligand specificity - FGFR2 IIIb vs IIIc
  5. Regulatory vs constitutive - PKM1 vs PKM2

Drosophila vs Human: DSCAM Case Study

Drosophila Dscam1 is a spectacular example of isoform diversity - 38,016 potential isoforms from mutually exclusive splicing of 4 exon clusters. Each neuron expresses a unique repertoire enabling self-avoidance. However, human DSCAM does not have this diversity - vertebrates achieve similar neuronal specificity through clustered protocadherins instead.

GO Annotation Challenges

Current GO annotation practice typically:
- Annotates at the gene product level (canonical isoform)
- Uses PRO ontology IDs for isoform-specific annotations (but rarely)
- May conflate functions that are isoform-specific

This project should:
1. Document isoform-specific functions clearly
2. Recommend which functions apply to which isoforms
3. Suggest GO annotation improvements

Isoform Tracking in This Project

As of 2026-01-18, the data model and ETL have been extended to track isoform-specific annotations:

Important caveat: Just because an annotation was made on a specific isoform doesn't necessarily mean the function is isoform-specific. Researchers may have used a particular isoform for convenience. The isoform field indicates what was tested, not necessarily what is unique to that isoform.

References

Key reviews and resources:
- Alternative splicing - Wikipedia
- Alternative Splicing and Isoforms: From Mechanisms to Diseases (PMC)
- BCL-2 family isoforms in apoptosis and cancer (Nature Cell Death Dis)
- Neural Isoforms of Agrin (PMC)
- VEGF isoforms and angiogenesis (Nature)


STATUS

Progress Tracking

Tier 1 Genes

Tier 2 Genes

Tier 3 Genes

Summary Stats


NOTES

2026-01-19

New Data Model: functional_isoforms

Added a new curator-defined field to the schema for tracking functionally distinct gene products. Unlike alternative_products (ETL-seeded from UniProt), this is purely curated:

FunctionalIsoformTypeEnum:
  - SPLICE_VARIANT: Single splice isoform
  - SPLICE_CLASS: Group of related isoforms (e.g., [WT1](../../genes/human/WT1/WT1-ai-review.html) +KTS vs -KTS)
  - CLEAVAGE_PRODUCT: Post-translational proteolysis (e.g., [POMC](../../genes/human/POMC/POMC-ai-review.html) peptides)
  - MODIFICATION_STATE: PTM-dependent states
  - CONFORMATIONAL_STATE: Distinct functional conformations

FunctionalIsoformMappingTypeEnum:
  - UNIPROT_ISOFORM: e.g., P19544-1
  - UNIPROT_CHAIN: e.g., PRO_0000024969 (UniProt [FT](../../genes/ARATH/FT/FT-ai-review.html) PEPTIDE)

IMPORTANT: UniProt chain/peptide IDs (PRO_NNNNNNN) have NOTHING to do with the PRO ontology (PR:NNNNNNN namespace). Different systems!


POMC Review Complete (Polyprotein Paradigm):

POMC is fundamentally different from alternative splicing - it's a polyprotein cleaved into multiple bioactive peptides:

Peptide UniProt Chain Function Effect on Appetite
ACTH PRO_0000024969 Cortisol release (HPA axis) -
Alpha-MSH PRO_0000024970 MC4R binding, pigmentation ANOREXIGENIC
Beta-endorphin PRO_0000024975 Opioid signaling OREXIGENIC
Beta-MSH PRO_0000024974 MC1R binding, pigmentation -
Gamma-MSH PRO_0000024967 Sodium regulation -

Key insight: Alpha-MSH and beta-endorphin are produced in the SAME hypothalamic neurons but have ANTAGONISTIC effects on appetite. This is a paradigm case of why gene-level annotation is insufficient.

55 annotations reviewed: 42 ACCEPT, 4 NON_CORE, 5 OVER_ANNOTATED, 1 MODIFY, 3 REMOVE


App (Mouse) Review Complete - Most Complex Case Yet:

App demonstrates BOTH alternative splicing AND proteolytic cleavage:

Type Variant Function
SPLICE APP695 Neuronal, no KPI domain
SPLICE APP751/770 Peripheral, has KPI (protease inhibitor)
CLEAVAGE sAPPalpha NEUROPROTECTIVE (non-amyloidogenic)
CLEAVAGE Abeta42 NEUROTOXIC (Alzheimer's)
CLEAVAGE AICD Nuclear signaling (controversial)

Critical insight: The same gene produces both NEUROPROTECTIVE (sAPPalpha) and NEUROTOXIC (Abeta) products depending on whether alpha-secretase or beta-secretase cleaves first. This is a paradigm for how proteolytic processing balance determines cell fate.

342 annotations reviewed: 267 ACCEPT, 62 OVER_ANNOTATED (Abeta-specific), 8 NON_CORE, 1 MODIFY, 1 REMOVE


Template Updates Complete:

The Jinja2 template (gene_review.html.j2) has been fully updated to support isoform tracking:

  1. Alternative Products section - Renders isoform cards showing id, name, sequence_note, and description for each isoform (implemented earlier)

  2. Isoform badge on annotations - NEW: Annotations with isoform field now display a blue badge linking to the UniProt isoform (e.g., P19544-1)

  3. NOT badge on annotations - NEW: Negated annotations (negated: true) now display a red "NOT" badge

Tested with:
- WT1: Verified isoform-specific annotations (P19544-1) and NOT annotations display correctly
- BCL2L1: Verified Alternative Products section renders Bcl-xL, Bcl-xS, and Q07817-3 isoforms


Tier 3 Annotation Review Progress:

Gene Total ACCEPT NON_CORE OVER_ANNOTATED MODIFY Key Finding
FGFR2 229 160 45 16 6 IIIb vs IIIc ligand specificity
PKM 79 31 35 6 7 M1 vs M2 Warburg effect
STAT3 456 364 26 0 0 Alpha vs beta dominant-negative

Key Tier 3 Findings:

  1. FGFR2: Classic epithelial/mesenchymal isoform switch. FGFR2IIIb (P21802-3) is epithelial-specific and binds FGF7/KGF and FGF10. FGFR2IIIc (P21802-1) is mesenchymal-specific and binds FGF2/FGF4. This enables paracrine signaling between tissue compartments. 16 annotations marked as potentially isoform-specific or over-annotated.

  2. PKM: The M1/M2 switch is fundamental to the Warburg effect. PKM1 (P14618-2) is constitutively active in adult differentiated tissues. PKM2 (P14618-1) has low activity and is allosterically regulated - this diverts glycolytic intermediates to biosynthesis in proliferating/cancer cells. PKM2 also has PKM1-absent functions: protein kinase activity, nuclear translocation, and transcription coactivation. 6 annotations identified as PKM2-specific.

  3. STAT3: Alpha (P40763-1) has full transactivation domain; beta (P40763-2) lacks TAD and can act as dominant-negative. Already reviewed - 456 annotations processed with isoform biology documented.

Tier 3 Progress: 3/5 genes complete (FGFR2, PKM, STAT3)

Remaining: PTBP1/2 (splicing regulators - different focus), RON/MST1R


Tier 2 Annotation Review Completed:

Gene Total ACCEPT NON_CORE OVER_ANNOTATED Key Finding
FN1 193 62 39 0 EDA/EDB developmental splicing
TPM1 55 34 12 9 Muscle vs cytoskeletal isoforms
TPM3 39 19 4 16 Slow muscle vs TM30nm cytoskeletal
DSCAM 50 42 3 4 Only 2 isoforms (not like Drosophila!)

Key Tier 2 Findings:

  1. FN1: 17 isoforms with EDA/EDB domain variation. Plasma FN (hepatocyte) vs cellular FN (fibroblast). EDA/EDB+ isoforms are oncofetal antigens used in cancer imaging.

  2. TPM1/TPM3: Tissue-specific tropomyosin isoforms - skeletal muscle, smooth muscle, cardiac, and cytoskeletal variants. Annotations for "muscle contraction" often apply only to muscle isoforms. TPM3 had highest over-annotation rate (16/39).

  3. DSCAM: CRITICAL CAVEAT - Human DSCAM has only 2 isoforms, NOT the 38,016 of Drosophila Dscam1! Vertebrates use protocadherins instead. Some IBA annotations may be inappropriately extrapolated from fly.

Tier 1 + Tier 2 now complete: 925 annotations reviewed across 10 genes


Major Annotation Review Completed:

Completed full annotation reviews for BCL2L1, FAS, CASP9, and significant progress on VEGFA:

Gene Total ACCEPT NON_CORE OVER_ANNOTATED MODIFY Key Finding
BCL2L1 110 41 9 8 50 Pro/anti-apoptotic conflation
FAS 96 62 10 0 23 Membrane vs soluble antagonism
CASP9 114 70 41 0 2 Dominant-negative isoform
VEGFA 268 150 59 21 0 Anti-angiogenic VEGF165B

All Tier 1 genes now COMPLETE!

Key Isoform Biology Documented:

  1. BCL2L1: Bcl-xL (anti-apoptotic) vs Bcl-xS (pro-apoptotic) - classic antagonism. 8 annotations marked OVER_ANNOTATED where pro-apoptotic function was incorrectly applied to the whole gene (Bcl-xL is anti-apoptotic).

  2. FAS: Membrane isoform 1 triggers apoptosis via DISC; soluble isoforms 2-6 BLOCK apoptosis as decoy receptors. GOA has BOTH GO:0043065 (positive) AND GO:0043066 (negative) apoptosis regulation.

  3. CASP9: Caspase-9L (isoform 1) induces apoptosis; Caspase-9S (isoform 2) is dominant-negative inhibitor competing for Apaf-1 binding. The 9L/9S ratio determines apoptotic sensitivity.

  4. VEGFA: 17 isoforms including VEGF165B (P15692-8) which is ANTI-angiogenic while canonical isoforms are pro-angiogenic. 11 annotations marked OVER_ANNOTATED.

Isoform Conflation Pattern Confirmed:

All four genes show the same pattern: GO annotations conflate opposing functions of different splice isoforms. This is exactly the problem the ISOFORMS project was designed to identify.


2026-01-18

BCL2L1 Review Started:

BCL2L1 is a PARADIGM case of antagonistic isoform functions:

  1. Bcl-xL (Q07817-1, 233 AA): Anti-apoptotic, contains BH1-4 domains
  2. Bcl-xS (Q07817-2, 166 AA): Pro-apoptotic, lacks BH1/BH2 domains
  3. Bcl-xS can heterodimerize with Bcl-xL to inhibit its function

Isoform Conflation Identified:
- GO:0043065 "positive regulation of apoptotic process" (IBA) - MARKED AS OVER_ANNOTATED
- This annotation likely reflects Bcl-xS function but is applied to the whole gene
- The canonical isoform Bcl-xL is ANTI-apoptotic
- IBA evidence doesn't distinguish isoforms

Key Finding: The GOA has BOTH pro-apoptotic and anti-apoptotic annotations because it conflates the antagonistic isoforms. This is exactly the type of annotation error we aimed to identify in this project.


WT1 NOT Annotation Reviewed:

WT1 has isoform-specific annotations in GOA (P19544-1 = +KTS isoform). We reviewed and fixed:

  1. NOT annotation for GO:0045893 (positive regulation of DNA-templated transcription)
  2. The +KTS isoform (P19544-1) does NOT activate transcription
  3. The -KTS isoform DOES activate transcription (e.g., SRY promoter)
  4. PMID:9815658 clearly shows this isoform-specific difference
  5. Action: ACCEPT - this is a well-documented paradigm case

  6. Fixed duplicate annotation bug: There was an incorrect positive annotation claiming +KTS activates transcription - removed this as the GOA correctly has it as a NOT annotation.

  7. Key biology: The 3-amino acid KTS insertion between zinc fingers 3 and 4 changes WT1's DNA binding properties. +KTS localizes to nuclear speckles and associates with splicing factors; -KTS acts as a transcription factor.

Added WT1 to tracking - While not originally in the project gene list, WT1 is an excellent example of isoform-specific function with both:
- Positive annotations specific to -KTS isoforms
- NOT annotations specific to +KTS isoforms


AGRN Review Complete:

The AGRN gene review was already complete (105 annotations reviewed). AGRN serves as an excellent paradigm case for this project because the review captures:

  1. Two types of isoform variation:
  2. N-terminal diversity: LN-agrin (isoform 1, secreted) vs SN-agrin (isoform 2, transmembrane)
  3. C-terminal splice variants: z+ (neural, contain z8 insert) vs z0 (muscle, no insert)

  4. Functionally distinct activities:

  5. Neural z+ isoforms: Active at NMJ, induce AChR clustering via LRP4-MuSK
  6. Muscle z0 isoforms: Structural roles in basement membranes, no NMJ clustering activity

  7. The review correctly distinguishes:

  8. "neuromuscular junction development" as a CORE function (z+ specific)
  9. "basement membrane" as a CORE localization (both isoforms but LN-agrin dominant)
  10. "nervous system development" as NON_CORE (too broad)

Key insight: While the GOA file doesn't contain explicit isoform-specific annotations (no O00468-1 or O00468-2), the review narrative correctly captures isoform-specific biology in the summaries and reasons.


Extended data model for isoform tracking:
- Added optional isoform field to ExistingAnnotation in gene_review.yaml schema
- Updated GOAAnnotation dataclass in goa_validator.py to extract isoform from gene product ID
- UniProt isoform IDs (e.g., P19544-1) are now automatically detected and stored
- Added comprehensive tests for isoform extraction and seeding
- All 29 GOA validator tests pass

Human genes with isoform-specific annotations identified in existing GOA files:
- BCL2L12 (Q9HB09-1)
- DAB2IP (Q5VWQ8-2)
- NFS1 (Q9Y697-2)
- PTEN (P60484-2)
- SGCA (Q16586-2)
- WT1 (P19544-1)

2026-01-16

Initial project creation. Research conducted on classic isoform cases:

Key findings from literature search:

  1. AGRN (Agrin) - Excellent paradigm case
  2. Z+ isoforms (neuronal): contain 8-19 AA inserts, potent AChR clustering activity
  3. Z- isoforms (muscle): lack inserts, no clustering activity
  4. Regulated by NOVA1/2 and PTBP1 during neuronal differentiation
  5. Clinical relevance: mis-splicing in SMA

  6. BCL2L1 (Bcl-x) - Classic antagonistic isoforms

  7. Bcl-xL (long): anti-apoptotic, contains BH1-4 domains
  8. Bcl-xS (short): pro-apoptotic, lacks BH1/BH2
  9. Alternative 5' splice site selection in exon 2
  10. Cancer relevance: Bcl-xL overexpression confers drug resistance

  11. VEGFA - Differential matrix binding

  12. VEGF121: freely diffusible, no heparin binding
  13. VEGF165: intermediate, binds heparin, NRP1 coreceptor
  14. VEGF189: matrix-bound, densest vessel sprouting
  15. Different vascular patterning outcomes

  16. FAS/CD95 - Membrane vs soluble antagonism

  17. Membrane Fas: triggers apoptosis
  18. Soluble Fas (lacks exon 6/TM domain): inhibits apoptosis
  19. At least 7 isoforms identified

  20. FN1 (Fibronectin) - Embryonic/wound healing specific

  21. EDA/EDB (EIIIA/EIIIB) domains: alternative exons
  22. Plasma FN: lacks both, soluble
  23. Cellular FN: contains EDA/EDB, embryonic, wound healing
  24. Double-null embryos have cardiovascular defects

Decision on DSCAM: Include with caveats - human DSCAM does NOT have the 38,016 isoform diversity of Drosophila Dscam1. The vertebrate equivalent is clustered protocadherins. Still worth reviewing but with lower priority.

Next steps: Begin with AGRN as it has clear, well-documented isoform-specific functions with clinical relevance.