BioReason-Pro comparison — stats notebooks
Reproducible Jupyter notebooks that recompute the summary statistics quoted in
../../BIOREASON_COMPARISON.md and the
manuscript directly from the committed per-gene
files under genes/. Nothing is hard-coded — every table and figure is derived
on the fly, so the notebooks double as a reproducibility check for the paper's numbers.
Notebooks
| Notebook | Reproduces | Source files |
|---|---|---|
01_narrative_scores.ipynb |
Overall means (RL 3.7/2.9, SFT 2.9/2.7), Table 1 (score distribution), Table 2 (per-organism means), top performers / critical failures, the per-organism figure | genes/*/*/*bioreason-{rl,sft}-review.md |
02_prediction_assessments.ipynb |
Per-term SFT assessment distribution on ARGO95, plus supplemental mixed-source ARGO139 / all-source union / all-HF views | genes/*/*/*-sft-predictions.yaml, ../genes.csv |
Shared parsing lives in bioreason_stats.py; the notebook
cell contents are generated by build_notebooks.py so they
are easy to review in a diff and stay in sync with the module.
Note on benchmark denominators.
../genes.csvis ARGO139, the fixed
139-gene benchmark used for the paper's BioReason-Pro RL analysis. ARGO95 is
the 95-gene ARGO139 subset with HF-catalogue SFT GO-term predictions and is
the primary SFT denominator. The 44 ARGO139 genes absent from HF have SFT web
exports, but those are retained as supplemental source diagnostics. The
all-source union and all-HF catalogue views are supplemental.
Running
Dependencies are pinned with uv (pyproject.toml
+ uv.lock).
cd projects/BIOREASON_COMPARISON/notebooks
# regenerate the .ipynb files from the builder and execute them in place
uv run python build_notebooks.py
uv run jupyter nbconvert --to notebook --execute --inplace *.ipynb
# or work interactively
uv run jupyter lab
Regenerated figures are written to figures/*.repro.png (kept out of git) and
should match the committed ../article/figures/ versions.