ISOFORMS: Genes with Clear Functional Differences Between Isoforms
Overview
This project explores genes where alternative splicing produces isoforms with demonstrably different biological functions. These cases are particularly important for GO annotation because:
- Standard annotation often conflates isoform functions - a single gene gets annotated with functions that may only apply to specific isoforms
- Isoform-specific annotations are rare in GO - most annotations are made at the gene level
- Understanding isoform biology is critical for precision medicine and accurate functional characterization
Selection Criteria
Genes selected for this project should have:
- Well-characterized alternative splicing events
- Clear functional differences between isoforms (not just expression pattern differences)
- Preferably opposing or divergent functions (e.g., pro-apoptotic vs anti-apoptotic)
- Strong experimental evidence for isoform-specific activities
Priority Genes
Tier 1: Classic Paradigms (Start Here)
- [ ] AGRN (human) - Agrin: neuronal Z+ isoforms cluster AChRs at NMJ; muscle Z- isoforms do not
- [ ] BCL2L1 (human) - Bcl-x: Bcl-xL is anti-apoptotic; Bcl-xS is pro-apoptotic (antagonistic!)
- [ ] VEGFA (human) - VEGF: isoforms differ in heparin binding, diffusibility, and vascular patterning
- [ ] FAS (human) - CD95: membrane isoform triggers apoptosis; soluble isoform inhibits it
- [ ] CASP9 (human) - Caspase-9: constitutive form induces apoptosis; short isoform inhibits it
Tier 2: Developmental/Tissue-Specific
- [ ] FN1 (human) - Fibronectin: EDA/EDB isoforms are embryonic/wound-healing specific
- [ ] TPM1 (human) - Tropomyosin: striated muscle vs smooth muscle vs non-muscle isoforms
- [ ] TPM3 (human) - Tropomyosin 3: slow muscle fiber specific isoform
- [ ] DSCAM (human) - Note: human DSCAM lacks the extreme diversity of Drosophila Dscam1
Tier 3: Additional Cases to Explore
- [ ] FGFR2 (human) - FGF receptor 2: IIIb vs IIIc isoforms have different ligand specificities
- [ ] PKM (human) - Pyruvate kinase: PKM1 (adult muscle) vs PKM2 (embryonic, cancer)
- [ ] PTBP1/2 (human) - Master splicing regulators that control many isoform switches
- [ ] RON/MST1R (human) - Proto-oncogene with motility-promoting splice variant in cancer
- [ ] STAT3 (human) - STAT3α vs STAT3β have distinct transcriptional activities
Special Cases: Polyproteins (Post-Translational Cleavage)
These are NOT alternative splicing but share the annotation challenge of multiple functional products from one gene:
- [x] POMC (human) - Pro-opiomelanocortin: cleaved into ACTH, alpha-MSH, beta-endorphin with ANTAGONISTIC functions (anorexigenic vs orexigenic)
- [x] App (mouse) - Amyloid precursor protein: BOTH splicing (APP695 vs APP751/770 KPI) AND cleavage (sAPPalpha NEUROPROTECTIVE vs Abeta NEUROTOXIC) - most complex case
- [x] APP (human) - Human ortholog, directly relevant to Alzheimer's disease with familial mutations
Background: Key Concepts
Types of Functional Differences
- Antagonistic functions - Bcl-xL vs Bcl-xS, membrane vs soluble Fas
- Tissue-specific activity - Agrin Z+ (neuronal) vs Z- (muscle)
- Localization differences - VEGF isoforms (diffusible vs matrix-bound)
- Ligand specificity - FGFR2 IIIb vs IIIc
- Regulatory vs constitutive - PKM1 vs PKM2
Drosophila vs Human: DSCAM Case Study
Drosophila Dscam1 is a spectacular example of isoform diversity - 38,016 potential isoforms from mutually exclusive splicing of 4 exon clusters. Each neuron expresses a unique repertoire enabling self-avoidance. However, human DSCAM does not have this diversity - vertebrates achieve similar neuronal specificity through clustered protocadherins instead.
GO Annotation Challenges
Current GO annotation practice typically:
- Annotates at the gene product level (canonical isoform)
- Uses PRO ontology IDs for isoform-specific annotations (but rarely)
- May conflate functions that are isoform-specific
This project should:
1. Document isoform-specific functions clearly
2. Recommend which functions apply to which isoforms
3. Suggest GO annotation improvements
Isoform Tracking in This Project
As of 2026-01-18, the data model and ETL have been extended to track isoform-specific annotations:
- The
ExistingAnnotationclass now has an optionalisoformfield - When GOA annotations use UniProt isoform IDs (e.g.,
P19544-1), the isoform is automatically extracted - The field stores the full UniProt isoform ID when present
- Canonical annotations (no isoform suffix) have this field omitted
Important caveat: Just because an annotation was made on a specific isoform doesn't necessarily mean the function is isoform-specific. Researchers may have used a particular isoform for convenience. The isoform field indicates what was tested, not necessarily what is unique to that isoform.
References
Key reviews and resources:
- Alternative splicing - Wikipedia
- Alternative Splicing and Isoforms: From Mechanisms to Diseases (PMC)
- BCL-2 family isoforms in apoptosis and cancer (Nature Cell Death Dis)
- Neural Isoforms of Agrin (PMC)
- VEGF isoforms and angiogenesis (Nature)
STATUS
Progress Tracking
Tier 1 Genes
- [x] AGRN - COMPLETE - Excellent paradigm case with detailed isoform discussion
- [x] WT1 - COMPLETE - +KTS vs -KTS isoforms, NOT annotation reviewed
- [x] BCL2L1 - COMPLETE - 110 annotations | 8 OVER_ANNOTATED (pro/anti-apoptotic conflation)
- [x] FAS - COMPLETE - 96 annotations | membrane vs soluble antagonism documented
- [x] VEGFA - COMPLETE - 268 annotations | 21 OVER_ANNOTATED (anti-angiogenic VEGF165B)
- [x] CASP9 - COMPLETE - 114 annotations | dominant-negative isoform 2 documented
Tier 2 Genes
- [x] FN1 - COMPLETE - 193 annotations | EDA/EDB domain developmental splicing
- [x] TPM1 - COMPLETE - 55 annotations | 9 OVER_ANNOTATED (muscle vs cytoskeletal isoforms)
- [x] TPM3 - COMPLETE - 39 annotations | 16 OVER_ANNOTATED (slow muscle vs TM30nm cytoskeletal)
- [x] DSCAM - COMPLETE - 50 annotations | 4 OVER_ANNOTATED (human has only 2 isoforms, not 38,016!)
Tier 3 Genes
- [x] FGFR2 - COMPLETE - 229 annotations | 16 OVER_ANNOTATED (IIIb vs IIIc ligand specificity)
- [x] PKM - COMPLETE - 79 annotations | 6 OVER_ANNOTATED (M1 vs M2 Warburg effect)
- [ ] PTBP1/2 - Not started (splicing regulators, different project focus)
- [ ] RON/MST1R - Not started
- [x] STAT3 - COMPLETE - 456 annotations | alpha vs beta dominant-negative documented
Summary Stats
- Total genes: 15
- Completed: 13 (Tier 1 + Tier 2 + 3 Tier 3)
- In progress: 0
- Tier 1 genes: 6/6 COMPLETE
- Tier 2 genes: 4/4 COMPLETE
- Tier 3 genes: 3/5 COMPLETE
- Total annotations reviewed: 1689 across completed genes
NOTES
2026-01-19
New Data Model: functional_isoforms
Added a new curator-defined field to the schema for tracking functionally distinct gene products. Unlike alternative_products (ETL-seeded from UniProt), this is purely curated:
FunctionalIsoformTypeEnum:
- SPLICE_VARIANT: Single splice isoform
- SPLICE_CLASS: Group of related isoforms (e.g., [WT1](../../genes/human/WT1/WT1-ai-review.html) +KTS vs -KTS)
- CLEAVAGE_PRODUCT: Post-translational proteolysis (e.g., [POMC](../../genes/human/POMC/POMC-ai-review.html) peptides)
- MODIFICATION_STATE: PTM-dependent states
- CONFORMATIONAL_STATE: Distinct functional conformations
FunctionalIsoformMappingTypeEnum:
- UNIPROT_ISOFORM: e.g., P19544-1
- UNIPROT_CHAIN: e.g., PRO_0000024969 (UniProt [FT](../../genes/ARATH/FT/FT-ai-review.html) PEPTIDE)
IMPORTANT: UniProt chain/peptide IDs (PRO_NNNNNNN) have NOTHING to do with the PRO ontology (PR:NNNNNNN namespace). Different systems!
POMC Review Complete (Polyprotein Paradigm):
POMC is fundamentally different from alternative splicing - it's a polyprotein cleaved into multiple bioactive peptides:
| Peptide | UniProt Chain | Function | Effect on Appetite |
|---|---|---|---|
| ACTH | PRO_0000024969 | Cortisol release (HPA axis) | - |
| Alpha-MSH | PRO_0000024970 | MC4R binding, pigmentation | ANOREXIGENIC |
| Beta-endorphin | PRO_0000024975 | Opioid signaling | OREXIGENIC |
| Beta-MSH | PRO_0000024974 | MC1R binding, pigmentation | - |
| Gamma-MSH | PRO_0000024967 | Sodium regulation | - |
Key insight: Alpha-MSH and beta-endorphin are produced in the SAME hypothalamic neurons but have ANTAGONISTIC effects on appetite. This is a paradigm case of why gene-level annotation is insufficient.
55 annotations reviewed: 42 ACCEPT, 4 NON_CORE, 5 OVER_ANNOTATED, 1 MODIFY, 3 REMOVE
App (Mouse) Review Complete - Most Complex Case Yet:
App demonstrates BOTH alternative splicing AND proteolytic cleavage:
| Type | Variant | Function |
|---|---|---|
| SPLICE | APP695 | Neuronal, no KPI domain |
| SPLICE | APP751/770 | Peripheral, has KPI (protease inhibitor) |
| CLEAVAGE | sAPPalpha | NEUROPROTECTIVE (non-amyloidogenic) |
| CLEAVAGE | Abeta42 | NEUROTOXIC (Alzheimer's) |
| CLEAVAGE | AICD | Nuclear signaling (controversial) |
Critical insight: The same gene produces both NEUROPROTECTIVE (sAPPalpha) and NEUROTOXIC (Abeta) products depending on whether alpha-secretase or beta-secretase cleaves first. This is a paradigm for how proteolytic processing balance determines cell fate.
342 annotations reviewed: 267 ACCEPT, 62 OVER_ANNOTATED (Abeta-specific), 8 NON_CORE, 1 MODIFY, 1 REMOVE
Template Updates Complete:
The Jinja2 template (gene_review.html.j2) has been fully updated to support isoform tracking:
-
Alternative Products section - Renders isoform cards showing id, name, sequence_note, and description for each isoform (implemented earlier)
-
Isoform badge on annotations - NEW: Annotations with
isoformfield now display a blue badge linking to the UniProt isoform (e.g., P19544-1) -
NOT badge on annotations - NEW: Negated annotations (
negated: true) now display a red "NOT" badge
Tested with:
- WT1: Verified isoform-specific annotations (P19544-1) and NOT annotations display correctly
- BCL2L1: Verified Alternative Products section renders Bcl-xL, Bcl-xS, and Q07817-3 isoforms
Tier 3 Annotation Review Progress:
| Gene | Total | ACCEPT | NON_CORE | OVER_ANNOTATED | MODIFY | Key Finding |
|---|---|---|---|---|---|---|
| FGFR2 | 229 | 160 | 45 | 16 | 6 | IIIb vs IIIc ligand specificity |
| PKM | 79 | 31 | 35 | 6 | 7 | M1 vs M2 Warburg effect |
| STAT3 | 456 | 364 | 26 | 0 | 0 | Alpha vs beta dominant-negative |
Key Tier 3 Findings:
-
FGFR2: Classic epithelial/mesenchymal isoform switch. FGFR2IIIb (P21802-3) is epithelial-specific and binds FGF7/KGF and FGF10. FGFR2IIIc (P21802-1) is mesenchymal-specific and binds FGF2/FGF4. This enables paracrine signaling between tissue compartments. 16 annotations marked as potentially isoform-specific or over-annotated.
-
PKM: The M1/M2 switch is fundamental to the Warburg effect. PKM1 (P14618-2) is constitutively active in adult differentiated tissues. PKM2 (P14618-1) has low activity and is allosterically regulated - this diverts glycolytic intermediates to biosynthesis in proliferating/cancer cells. PKM2 also has PKM1-absent functions: protein kinase activity, nuclear translocation, and transcription coactivation. 6 annotations identified as PKM2-specific.
-
STAT3: Alpha (P40763-1) has full transactivation domain; beta (P40763-2) lacks TAD and can act as dominant-negative. Already reviewed - 456 annotations processed with isoform biology documented.
Tier 3 Progress: 3/5 genes complete (FGFR2, PKM, STAT3)
Remaining: PTBP1/2 (splicing regulators - different focus), RON/MST1R
Tier 2 Annotation Review Completed:
| Gene | Total | ACCEPT | NON_CORE | OVER_ANNOTATED | Key Finding |
|---|---|---|---|---|---|
| FN1 | 193 | 62 | 39 | 0 | EDA/EDB developmental splicing |
| TPM1 | 55 | 34 | 12 | 9 | Muscle vs cytoskeletal isoforms |
| TPM3 | 39 | 19 | 4 | 16 | Slow muscle vs TM30nm cytoskeletal |
| DSCAM | 50 | 42 | 3 | 4 | Only 2 isoforms (not like Drosophila!) |
Key Tier 2 Findings:
-
FN1: 17 isoforms with EDA/EDB domain variation. Plasma FN (hepatocyte) vs cellular FN (fibroblast). EDA/EDB+ isoforms are oncofetal antigens used in cancer imaging.
-
TPM1/TPM3: Tissue-specific tropomyosin isoforms - skeletal muscle, smooth muscle, cardiac, and cytoskeletal variants. Annotations for "muscle contraction" often apply only to muscle isoforms. TPM3 had highest over-annotation rate (16/39).
-
DSCAM: CRITICAL CAVEAT - Human DSCAM has only 2 isoforms, NOT the 38,016 of Drosophila Dscam1! Vertebrates use protocadherins instead. Some IBA annotations may be inappropriately extrapolated from fly.
Tier 1 + Tier 2 now complete: 925 annotations reviewed across 10 genes
Major Annotation Review Completed:
Completed full annotation reviews for BCL2L1, FAS, CASP9, and significant progress on VEGFA:
| Gene | Total | ACCEPT | NON_CORE | OVER_ANNOTATED | MODIFY | Key Finding |
|---|---|---|---|---|---|---|
| BCL2L1 | 110 | 41 | 9 | 8 | 50 | Pro/anti-apoptotic conflation |
| FAS | 96 | 62 | 10 | 0 | 23 | Membrane vs soluble antagonism |
| CASP9 | 114 | 70 | 41 | 0 | 2 | Dominant-negative isoform |
| VEGFA | 268 | 150 | 59 | 21 | 0 | Anti-angiogenic VEGF165B |
All Tier 1 genes now COMPLETE!
Key Isoform Biology Documented:
-
BCL2L1: Bcl-xL (anti-apoptotic) vs Bcl-xS (pro-apoptotic) - classic antagonism. 8 annotations marked OVER_ANNOTATED where pro-apoptotic function was incorrectly applied to the whole gene (Bcl-xL is anti-apoptotic).
-
FAS: Membrane isoform 1 triggers apoptosis via DISC; soluble isoforms 2-6 BLOCK apoptosis as decoy receptors. GOA has BOTH GO:0043065 (positive) AND GO:0043066 (negative) apoptosis regulation.
-
CASP9: Caspase-9L (isoform 1) induces apoptosis; Caspase-9S (isoform 2) is dominant-negative inhibitor competing for Apaf-1 binding. The 9L/9S ratio determines apoptotic sensitivity.
-
VEGFA: 17 isoforms including VEGF165B (P15692-8) which is ANTI-angiogenic while canonical isoforms are pro-angiogenic. 11 annotations marked OVER_ANNOTATED.
Isoform Conflation Pattern Confirmed:
All four genes show the same pattern: GO annotations conflate opposing functions of different splice isoforms. This is exactly the problem the ISOFORMS project was designed to identify.
2026-01-18
BCL2L1 Review Started:
BCL2L1 is a PARADIGM case of antagonistic isoform functions:
- Bcl-xL (Q07817-1, 233 AA): Anti-apoptotic, contains BH1-4 domains
- Bcl-xS (Q07817-2, 166 AA): Pro-apoptotic, lacks BH1/BH2 domains
- Bcl-xS can heterodimerize with Bcl-xL to inhibit its function
Isoform Conflation Identified:
- GO:0043065 "positive regulation of apoptotic process" (IBA) - MARKED AS OVER_ANNOTATED
- This annotation likely reflects Bcl-xS function but is applied to the whole gene
- The canonical isoform Bcl-xL is ANTI-apoptotic
- IBA evidence doesn't distinguish isoforms
- GO:0043066 "negative regulation of apoptotic process" (IDA x2) - ACCEPTED
- Core function of Bcl-xL, supported by PMID:9388232 and PMID:7650367
Key Finding: The GOA has BOTH pro-apoptotic and anti-apoptotic annotations because it conflates the antagonistic isoforms. This is exactly the type of annotation error we aimed to identify in this project.
WT1 NOT Annotation Reviewed:
WT1 has isoform-specific annotations in GOA (P19544-1 = +KTS isoform). We reviewed and fixed:
- NOT annotation for GO:0045893 (positive regulation of DNA-templated transcription)
- The +KTS isoform (P19544-1) does NOT activate transcription
- The -KTS isoform DOES activate transcription (e.g., SRY promoter)
- PMID:9815658 clearly shows this isoform-specific difference
-
Action: ACCEPT - this is a well-documented paradigm case
-
Fixed duplicate annotation bug: There was an incorrect positive annotation claiming +KTS activates transcription - removed this as the GOA correctly has it as a NOT annotation.
-
Key biology: The 3-amino acid KTS insertion between zinc fingers 3 and 4 changes WT1's DNA binding properties. +KTS localizes to nuclear speckles and associates with splicing factors; -KTS acts as a transcription factor.
Added WT1 to tracking - While not originally in the project gene list, WT1 is an excellent example of isoform-specific function with both:
- Positive annotations specific to -KTS isoforms
- NOT annotations specific to +KTS isoforms
AGRN Review Complete:
The AGRN gene review was already complete (105 annotations reviewed). AGRN serves as an excellent paradigm case for this project because the review captures:
- Two types of isoform variation:
- N-terminal diversity: LN-agrin (isoform 1, secreted) vs SN-agrin (isoform 2, transmembrane)
-
C-terminal splice variants: z+ (neural, contain z8 insert) vs z0 (muscle, no insert)
-
Functionally distinct activities:
- Neural z+ isoforms: Active at NMJ, induce AChR clustering via LRP4-MuSK
-
Muscle z0 isoforms: Structural roles in basement membranes, no NMJ clustering activity
-
The review correctly distinguishes:
- "neuromuscular junction development" as a CORE function (z+ specific)
- "basement membrane" as a CORE localization (both isoforms but LN-agrin dominant)
- "nervous system development" as NON_CORE (too broad)
Key insight: While the GOA file doesn't contain explicit isoform-specific annotations (no O00468-1 or O00468-2), the review narrative correctly captures isoform-specific biology in the summaries and reasons.
Extended data model for isoform tracking:
- Added optional isoform field to ExistingAnnotation in gene_review.yaml schema
- Updated GOAAnnotation dataclass in goa_validator.py to extract isoform from gene product ID
- UniProt isoform IDs (e.g., P19544-1) are now automatically detected and stored
- Added comprehensive tests for isoform extraction and seeding
- All 29 GOA validator tests pass
Human genes with isoform-specific annotations identified in existing GOA files:
- BCL2L12 (Q9HB09-1)
- DAB2IP (Q5VWQ8-2)
- NFS1 (Q9Y697-2)
- PTEN (P60484-2)
- SGCA (Q16586-2)
- WT1 (P19544-1)
2026-01-16
Initial project creation. Research conducted on classic isoform cases:
Key findings from literature search:
- AGRN (Agrin) - Excellent paradigm case
- Z+ isoforms (neuronal): contain 8-19 AA inserts, potent AChR clustering activity
- Z- isoforms (muscle): lack inserts, no clustering activity
- Regulated by NOVA1/2 and PTBP1 during neuronal differentiation
-
Clinical relevance: mis-splicing in SMA
-
BCL2L1 (Bcl-x) - Classic antagonistic isoforms
- Bcl-xL (long): anti-apoptotic, contains BH1-4 domains
- Bcl-xS (short): pro-apoptotic, lacks BH1/BH2
- Alternative 5' splice site selection in exon 2
-
Cancer relevance: Bcl-xL overexpression confers drug resistance
-
VEGFA - Differential matrix binding
- VEGF121: freely diffusible, no heparin binding
- VEGF165: intermediate, binds heparin, NRP1 coreceptor
- VEGF189: matrix-bound, densest vessel sprouting
-
Different vascular patterning outcomes
-
FAS/CD95 - Membrane vs soluble antagonism
- Membrane Fas: triggers apoptosis
- Soluble Fas (lacks exon 6/TM domain): inhibits apoptosis
-
At least 7 isoforms identified
-
FN1 (Fibronectin) - Embryonic/wound healing specific
- EDA/EDB (EIIIA/EIIIB) domains: alternative exons
- Plasma FN: lacks both, soluble
- Cellular FN: contains EDA/EDB, embryonic, wound healing
- Double-null embryos have cardiovascular defects
Decision on DSCAM: Include with caveats - human DSCAM does NOT have the 38,016 isoform diversity of Drosophila Dscam1. The vertebrate equivalent is clustered protocadherins. Still worth reviewing but with lower priority.
Next steps: Begin with AGRN as it has clear, well-documented isoform-specific functions with clinical relevance.