BioReason-Pro Comparison Project

Species: human, mouse, rat, worm, yeast, SCHPO, DROME, ARATH, ECOLI, BACSU, PSEPK, DANRE

Warnings (3)

BioReason-Pro Comparison Project

Systematic evaluation of BioReason-Pro functional summaries and reasoning traces (Fallahpour et al. 2026, doi:10.64898/2026.03.19.712954) against expert-curated AIGR gene reviews.

Methods

We downloaded the reports for selected genes from https://app.bioreason.net/ (there is no API yet so this cannot be done in bulk). We assigned an AI agent to review this, and compare them with both existing pipelines (e.g. interpro2go), and also with the complete AI gene review.

Notes on browsing results

Each individual gene review page, e.g. aprE, SlyD contains BOTH the bioreason results AND the detailed review of the bioreason results. You will need to search in the page for "bioreason" or scroll down to the sections:

Key findings (139 RL reviews)

  1. BioReason largely recapitulates interpro2go in narrative form. For most genes, the functional summary does not provide biological insight beyond what InterPro domain annotations already capture. Genuine value-add is modest and concentrated in proteins with distinctive, well-annotated domain architectures such as TOR1 (FRB domain enables pathway-level inference), PTEN, NOTCH1, and EGFR. For standard domain families like Src-family kinases, Fyn and Src receive essentially identical generic descriptions.

  2. Systematic localization errors. BioReason defaults to "cytosolic" or "cytoplasmic" when no transmembrane domains are detected, failing for periplasmic proteins (Skp, CpxP, Spy), vacuolar proteins (cps1), mitochondrial matrix proteins (alo1, HSP60, CAT2), and ER membrane proteins (IRE1, ETR1). Proteins where InterPro domain names explicitly mention the compartment (KAR2, PDI1) are handled correctly.

  3. Pseudo-enzymes are a blind spot. BioReason assumes catalytic activity from conserved but degenerate domains and cannot detect loss of catalytic residues. Epe1 (pseudo-demethylase), cts2 (pseudo-chitinase missing catalytic glutamate), and pmp20 (peroxiredoxin that has lost resolving cysteine and functions as a chaperone) are all incorrectly assigned the ancestral enzymatic activity.

  4. Paralogs get identical generic descriptions. Closely related paralogs such as Fyn/Src (mouse), sigF/sigG/sigK (B. subtilis sporulation sigma factors), and Hspa5/Hspa8 (rat Hsp70 family) receive interchangeable summaries with no gene-specific biology.

  5. Organism-specific biology is consistently absent. Dauer formation and insulin/IGF-1 signaling in C. elegans (daf-16, daf-2), UPRmt master regulation (atfs-1), sporulation compartment specificity in B. subtilis (sigF forespore, sigK mother cell), prion propagation in yeast (HSP104), and cytoophidium biology (ura7) are all missed.

  6. Mammalian genes score highest (mouse 4.7, rat 4.4, human 4.2 correctness) while S. pombe scores lowest (2.8). This likely reflects richer InterPro annotations and more informative family-level names for well-studied mammalian proteins.

  7. Overall correctness: 3.7/5 | Overall completeness: 2.9/5

  8. Only 1 gene scored 5/5 on both axes (Uggt1)

Background

Architecture:
- GO-GPT: autoregressive transformer (ESM2 embeddings + organism -> GO terms). Upstream predictor.
- BioReason-Pro: Qwen3-based multimodal reasoning LLM. Takes GO-GPT predictions + InterPro + PPI + organism context -> chain-of-thought reasoning trace + functional summary. Two variants: SFT (richer mechanistic depth) and RL (fewer hallucinations).

The GO term list in web exports is raw GO-GPT output (input to reasoning). BioReason-Pro produces its own GO terms after the reasoning step, but the current web app does not separately expose these.

Web app: app.bioreason.net | Code: bowang-lab/BioReason-Pro | Models: HuggingFace wanglab collection

Data

Per gene, the following files are available (example: ECOLI/[SlyD](../../genes/ECOLI/SlyD/SlyD-ai-review.html)):

File Description
{GENE}-deep-research-bioreason-rl.md Raw BioReason-Pro RL web export (reasoning trace, functional summary, InterPro, GO-GPT terms)
{GENE}-bioreason-rl-review.md Evaluation of reasoning trace vs curated review (correctness/completeness scores + interpro2go comparison)
{GENE}-bioreason-rl-predictions.yaml GO-GPT leaf terms as PredictionReview YAML
{GENE}-ai-review.yaml Expert-curated AIGR review (ground truth for comparison)

Evaluation rubric

Correctness (1-5)

Completeness (1-5)

Each review includes a comparison with interpro2go (GO_REF:0000002) annotations to assess whether BioReason adds value beyond automated domain-based annotation.

Results (139 genes)

Score distribution

Score Correctness Completeness
5 38 (27%) 1 (1%)
4 48 (35%) 40 (29%)
3 32 (23%) 51 (37%)
2 15 (11%) 40 (29%)
1 6 (4%) 7 (5%)

By organism

Organism n Correctness Completeness
mouse 11 4.7 3.6
rat 12 4.4 3.6
human 19 4.2 3.4
ARATH 3 4.0 3.3
yeast 11 3.9 2.6
BACSU 13 3.8 2.9
DROME 8 3.8 2.8
worm 15 3.5 2.3
PSEPK 8 3.4 3.0
ECOLI 13 3.2 3.0
SCHPO 23 2.8 2.3

Top performers (correctness 5/5)

Gene Organism Completeness Why it works
Uggt1 rat 5 ER quality control enzyme with highly informative domain names
TP53 human 4 Distinctive TAD-DBD-tetramerization architecture
PTEN human 4 Dual-specificity phosphatase domains are unambiguous
EGFR human 4 Canonical RTK architecture well-represented in InterPro
NOTCH1 human 4 Proteolytic cascade well-encoded in domain layout
MYC human 4 bHLH-LZ domains directly predict E-box binding
Akt1 mouse 4 PH + AGC kinase architecture is diagnostic
Calm1 mouse 4 EF-hand domains immediately predict calcium sensing
Pten mouse 4 Same as human PTEN
Trp53 mouse 4 Same as human TP53
TOR1 yeast 4 FRB domain enables pathway-level inference
ftsZ BACSU 4 Tubulin-like GTPase domain is highly specific
spo0A BACSU 4 Response regulator + DNA-binding domains clearly predict phosphorelay TF
GroEL ECOLI 4 Chaperonin domains are unambiguous
lgg-1 worm 4 Atg8/ubiquitin-like fold directly predicts autophagy adaptor
bst1 SCHPO 3 GPI inositol-deacylase function nailed from specific family annotation

Critical failures (correctness 1/5)

Gene Organism Completeness Failure mode
atg16 SCHPO 1 No InterPro domains available; confabulated carbohydrate metabolism
pmp20 SCHPO 2 Neo-functionalized peroxiredoxin -> chaperone; model assumes ancestral function
pol5 SCHPO 1 Predicted cytokinesis scaffold; actual function is rDNA transcription
Shu1 SCHPO 1 Predicted HECT ubiquitin ligase; actually a GPI-anchored heme receptor
csr-1 worm 1 Wrong input sequence (nhr-47 instead of CSR-1 Argonaute)
pgl-1 worm 1 Described as nuclear TF scaffold; actually cytoplasmic P granule component

Failure mode taxonomy

1. Pseudo-enzyme blind spot

BioReason assumes catalytic activity from conserved domains without checking whether catalytic residues are intact. This is a systematic failure for proteins that retain an ancestral fold but have lost enzymatic activity.

Examples:
- Epe1 (SCHPO, 2/5): BioReason claims "JmjC catalytic center dictates a lysine demethylase mechanism" but Epe1 has degenerate active site residues (HVD instead of HXD, Tyr307 instead of catalytic His). No detectable demethylase activity in mass spec assays. Functions as anti-silencing factor through HP1/Swi6 binding.
- cts2 (SCHPO, 2/5): Called an active chitinase, but the protein lacks the essential catalytic glutamate and is likely catalytically dead.
- pmp20 (SCHPO, 1/5): Predicted as a peroxidase, but has lost its resolving cysteine and functions as a molecular chaperone.

Notably, the BioReason paper highlights CFAP61 as a correctly identified pseudoenzyme, but this success does not generalize to our test set.

2. Localization defaults to cytoplasm

When InterPro annotations lack transmembrane or signal peptide information, BioReason systematically defaults to cytoplasmic localization. This fails for:

Proteins succeed when InterPro domain names explicitly contain the compartment (KAR2/BiP -> ER, PDI1 -> ER).

3. Paralog indistinguishability

Closely related family members receive essentially identical descriptions:

4. Organism-specific biology absent

BioReason's domain-to-function reasoning cannot capture biology that is specific to a lineage or organism:

5. Neo-functionalization and moonlighting

When a protein has acquired a function different from its ancestral domain prediction, BioReason defaults to the ancestral/family-typical function:

6. Narrative-GO prediction disconnect

In multiple cases, the GO term predictions from the upstream ESM model are more accurate than BioReason's narrative functional summary. The two outputs appear to be generated somewhat independently:

7. Cross-kingdom fold bias

Training data skewed toward well-studied organisms can bias predictions:

8. Wrong input data

Comparison with interpro2go

A central question is whether BioReason provides value beyond the automated interpro2go pipeline (GO_REF:0000002). Our reviews assessed this for each gene.

In most cases, BioReason is a narrative restatement of interpro2go. The functional summary translates domain annotations into prose without adding new biological insight. Where interpro2go makes errors (e.g., assigning generic "protein binding" to CnoX, importing eukaryotic flagellar terms for B. subtilis divIVA), BioReason typically recapitulates and sometimes amplifies these errors.

BioReason adds value in specific cases:
- Proteins with distinctive multi-domain architectures where the combination is diagnostic (TOR1, NOTCH1, PTEN, EGFR, spo0A)
- Proteins where family-level InterPro names are highly informative (Uggt1, bst1, KAR2)

BioReason inherits interpro2go errors:
- KEAP1: interpro2go assigns BTB-Kelch to actin binding; BioReason amplifies this into "actin remodeling" instead of the correct NRF2 regulation
- Ctnnb1: Armadillo repeat -> cell adhesion dominates; transcriptional co-activator role (Wnt/TCF-LEF) underweighted
- SecB: Chaperone family -> "protein folding" assigned, but SecB is an anti-folding holdase

Paper case study proteins

Full reasoning traces in supplementary C.6-C.15:

Protein UniProt Paper section Key finding
eEFSec P57772 Fig. 5, S2.6 De novo predicted SBP2 as binding partner, validated by cryo-EM
CFAP61 Q8NHU2 Fig. 6, S2.7 Correctly identified pseudoenzyme scaffold despite catalytic domains
EvoAcr1 synthetic S2.8 No homology/domains. Predictions varied by organism label. SFT fabricated InterPro.
EvoAcr2 synthetic S2.8 RL predicted phage-encoded host modulator -- biologically coherent

CFAP61 vs Epe1: same class, opposite results

Both are pseudoenzymes with catalytic domain signatures. BioReason correctly identifies CFAP61 as non-enzymatic (paper's featured result) but fails on Epe1, confidently calling it an active demethylase. This suggests pseudoenzyme detection is not systematic but case-dependent.

SFT vs RL comparison

Paper findings (27 evaluators, 162 proteins)

Our pilot (5 genes)

Paper evaluation sets