BIOREASON_COMPARISON / article
Paper drafts and supporting material for the BioReason-Pro / AI-AUGR manuscript, intended for ISMB 2026 Function-COSI.
Files
manuscript.tex— canonical full manuscript source (title, abstract, introduction, background, methods, results, discussion, limitations, conclusions, figure/table callouts, and BibTeX references). Start here.references.bib— BibTeX bibliography formanuscript.tex.manuscript.md— lightweight website bridge that points readers to the LaTeX source and local PDF build command.supplemental-benchmark-details.md— source-availability and denominator details moved out of the main manuscript.abstract.md— 2-page long-form conference abstract (earlier draft; largely superseded by the manuscript but kept as a source for the short version).short-abstract.md— 250-word short-form abstract (based on Google-Doc edits).slides.md/slides.html— ISMB 2026 Function-COSI slide deck (Marp source + rendered HTML; embedsfigures/). Render:npx @marp-team/marp-cli@latest slides.md --html --allow-local-files.README.md— this file.bioreason-pro-biorxiv.pdf— (not committed) reference PDF: Fallahpour et al. (2026) BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning, bioRxiv 10.64898/2026.03.19.712954. Local reference only; excluded via.gitignoredue to third-party rights.
Thesis
Annotation databases face a practical deployment question — when is a new function-prediction method good enough to trust in production? — that CAFA-style aggregate metrics ($F_{\max}$, $S_{\min}$) cannot fully answer. AI-AUGR (Assessment via Unified Gene-evidence Review) is an agentic curation pipeline that complements CAFA-style evaluation by:
- Reading the narrative. Modern agentic predictors such as BioReason-Pro emit free-text functional summaries and chain-of-thought reasoning traces that sit outside bag-of-GO-terms scoring.
- Surfacing systematic failure modes. Pseudoenzyme blind spots, localisation defaults, paralog indistinguishability, missing organism-specific biology, neo-functionalisation, narrative–GO disconnect, and cross-kingdom fold bias — not visible in aggregate scores, decisive for deployment.
- Distinguishing novel insight from restatement. Most BioReason-Pro summaries narratively restate InterPro2GO. An aggregate score cannot see this; an agentic review can.
Evidence base
- ARGO139/ARGO95 BioReason-Pro evaluation (see
../BIOREASON_COMPARISON.md): fixed 139-gene ARGO139 benchmark for RL narrative review, plus the 95-gene ARGO95 HF-catalogue subset for SFT GO-term review, with overall RL correctness 3.7/5, completeness 2.9/5, a seven-mode failure-mode taxonomy, and SFT term assessments. ESR-ECOLI-DET-Mini7-gene E. coli positive control and recap against de Crécy-Lagard et al. (2025, G3) expert error taxonomy (see../../VALIDATING_ECOLI_PREDICTIONS.mdand../recapitulation-experiment/claude-expt-1/; dataset ID10.5281/zenodo.20751016): AI-AUGR reproduces all 7 classes when labels/rationales are present as a positive control. An answer-key-withheld, literature/bioinformatics-assisted recapitulation recovers 4/7 exact labels, enough for useful triage but not expert-equivalent.- Supplemental SFT source checks on the public HuggingFace
wanglab/protein_cataloguedataset: retained for reproducibility insupplemental-benchmark-details.md.
How to read this directory
For a reviewer coming in cold, read in this order:
manuscript.tex— the full story.supplemental-benchmark-details.md— source availability and supplemental denominator checks.../BIOREASON_COMPARISON.md— the underlying experimental log with per-organism breakdown, top performers, critical failures, and full failure-mode taxonomy.../VALIDATING_ECOLI_PREDICTIONS.md— the de Crécy-Lagard positive-control experimental log.../recapitulation-experiment/claude-expt-1/README.md— archivedESR-ECOLI-DET-Minianswer-key-withheld recap results.short-abstract.md— a 250-word pitch.
PDF build
Build the manuscript PDF directly in this project workspace:
cd projects/BIOREASON_COMPARISON
just pdf
The recipe runs latexmk in article/ and writes article/manuscript.pdf.
Generated PDFs and LaTeX build outputs are intentionally ignored here.