BioReason-Pro comparison — stats notebooks

Reproducible Jupyter notebooks that recompute the summary statistics quoted in
../../BIOREASON_COMPARISON.md and the
manuscript directly from the committed per-gene
files under genes/. Nothing is hard-coded — every table and figure is derived
on the fly, so the notebooks double as a reproducibility check for the paper's numbers.

Notebooks

Notebook	Reproduces	Source files
`01_narrative_scores.ipynb`	Overall means (RL 3.7/2.9, SFT 2.9/2.7), Table 1 (score distribution), Table 2 (per-organism means), top performers / critical failures, the per-organism figure	`genes///*bioreason-{rl,sft}-review.md`
`02_prediction_assessments.ipynb`	Per-term SFT assessment distribution on ARGO95, plus supplemental mixed-source ARGO139 / all-source union / all-HF views	`genes///*-sft-predictions.yaml`, `../genes.csv`

Shared parsing lives in bioreason_stats.py; the notebook
cell contents are generated by build_notebooks.py so they
are easy to review in a diff and stay in sync with the module.

Note on benchmark denominators. ../genes.csv is ARGO139, the fixed
139-gene benchmark used for the paper's BioReason-Pro RL analysis. ARGO95 is
the 95-gene ARGO139 subset with HF-catalogue SFT GO-term predictions and is
the primary SFT denominator. The 44 ARGO139 genes absent from HF have SFT web
exports, but those are retained as supplemental source diagnostics. The
all-source union and all-HF catalogue views are supplemental.

Running

Dependencies are pinned with uv (pyproject.toml
+ uv.lock).

cd projects/BIOREASON_COMPARISON/notebooks

# regenerate the .ipynb files from the builder and execute them in place
uv run python build_notebooks.py
uv run jupyter nbconvert --to notebook --execute --inplace *.ipynb

# or work interactively
uv run jupyter lab

Regenerated figures are written to figures/*.repro.png (kept out of git) and
should match the committed ../article/figures/ versions.