PAINT Human No-IBA Gene Review Project
Overview
Review human genes that lack IBA (Inferred from Biological Ancestor) annotations. These genes are candidates for PAINT (Phylogenetic Annotation INference Tool) but currently have no phylogenetically-inferred annotations, meaning they may be:
- Poorly characterized
- Have unique/divergent functions
- Lack clear orthologs with experimental evidence
For each gene, the workflow generates at least 2 deep research reports (from different providers) and completes AI-assisted annotation review.
Source: ai4curation/ai-gene-review
Data Source
- Spreadsheet: https://docs.google.com/spreadsheets/d/12bR3FZ7XrUXL86IKJc__K6QbFBEsSlQeSdiVXwFsjEI/
- Local:
projects/PAINT/human-no-IBA-simple.csv(format: species,uniprot_id,gene_symbol) - Total genes: 7,594
Model Species
Primary: Homo sapiens (human)
- UniProt species code: human
- Genes without IBA annotations are priority targets
Workflow
just fetch-gene human GENE
just deep-research human GENE --provider falcon
just deep-research human GENE --provider cyberian
# Review and complete ai-review.yaml
just validate human GENE
Completed Reviews
165 genes with COMPLETE status (as of 2026-01-25)
To list all completed PAINT genes:
comm -12 <(cut -d',' -f3 projects/PAINT/human-no-IBA-simple.csv | sort) \
<(grep -l "status: COMPLETE" genes/human/*/*.yaml | xargs dirname | xargs -I{} basename {} | sort)
Highlighted Reviews (Notable Findings)
| Gene | Function | Finding |
|---|---|---|
| PLD3/PLD4/PLD5 | 5'-3' exonuclease | Misnamed - NOT phospholipase D enzymes |
| DAB2IP | GAP, tumor suppressor | Comprehensive multi-function review |
| RASA3 | Bifunctional RasGAP | Acts on both RAS and RAP1 |
| ICA1/ICA1L | BAR domain proteins | Membrane curvature sensing |
| IFIT2/IFIT3 | Antiviral effectors | Interferon-stimulated response |
| GADD45A/B/G | Stress response | MAPK pathway regulation |
Notable Findings
PLD3/PLD4/PLD5 Misannotation
These proteins are named "phospholipase D" but are actually:
- PLD3/PLD4: 5'-3' exonucleases with immune regulatory functions
- PLD5: Catalytically inactive pseudoenzyme
This is a prime example of misleading gene nomenclature that AI review can flag.
Batch Processing Infrastructure
The project has scaled to industrial batch processing:
- 50 batches prepared (2,500 gene capacity)
- Parallel architecture for continuous pipeline
- Can process 100+ genes per hour with deep research automation
Supplementary Files
See projects/PAINT/ folder:
- human-no-IBA-simple.csv - Gene list (species, uniprot_id, gene_symbol)
- human-no-IBA.tsv - Full annotation data
STATUS
Project Statistics (2026-02-04):
- Total genes in project: 7,594
- PAINT genes completed: 207 (2.7%)
- Total comprehensive reviews: 328
- Genes with gene folders: 424
- Genes with deep research: 339
- Ready for review (have deep research but not complete): 22
Progress
- [x] Infrastructure setup (batch processing, parallel deep research)
- [x] 207 PAINT gene reviews completed
- [ ] Complete reviews for remaining 22 genes with deep research
- [ ] Scale deep research to all genes
- [ ] Full project completion (7,594 genes)
Last updated: 2026-02-04
NOTES
2026-02-04
Major batch annotation review session
- Reviewed 81 genes using annotation-reviewer agent
- PAINT-specific completions: 165 → 207 (+42)
- Total completions: 246 → 328 (+82)
- Remaining with deep research: 104 → 22 (-82)
Notable genes reviewed include:
- Fe-S cluster assembly pathway: HSCB, HSPA9, IBA57, ISCA1, ISCA2, ISCU, NFS1, NFU1, MMS19, CIAO1, BOLA3
- Apoptosis/autophagy: BCL2, BCL2L1, BECN1, CASP9, ATG4D, ATG5, ATG7, DRAM1, DRAM2
- Transcription factors: GATA3, FOXO1, IRF8, OLIG2, ASCL1
- Signaling: NOTCH1, LRRK2, AXIN1, FAS
- Disease-relevant: HTT (Huntington), FXN (Friedreich ataxia), CBS (homocystinuria)
Note: Subagent status updates weren't persisting; fixed manually.
2026-01-25
Project reorganization and stats update
- Created top-level PAINT.md project file
- Renamed folder from paint/ to PAINT/ for consistency
- Updated stats: 165 COMPLETE reviews (was showing 14 - massively out of date)
- Started cyberian server for deep research
- Identified 285 genes needing cyberian deep research
- Created batch processing script
- First gene (ABCB7) completed in 17 minutes
2025-12-18(Massive Scaling Session)
Major Achievement: Transformed from manual workflow to industrialized batch processing
- Expanded from 297 to 1,806 gene folders (+508% growth)
- Submitted 1,500+ cyberian deep research jobs
- Created 50 batches (2,500 gene capacity)
- Established fully parallel, async pipeline
2025-12-18 (Review Session)
Completed reviews for:
- ICA1, ICA1L - BAR domain proteins
- SOCS4 - E3 ligase adaptor
- PLD3, PLD4, PLD5 - Discovered misannotation (NOT phospholipases)
- RASA3 - Bifunctional RasGAP
Key finding: PLD3/PLD4/PLD5 nomenclature is misleading - they are exonucleases, not phospholipases.