Metabolomics → GO Bridge Coverage Probe

Metabolomics → GO bridge coverage probe

A small, reproducible probe for the
Metabolomics × GO/GO-CAM project. It answers the first
follow-up question: does the metabolite → Rhea → rhea2go → GO bridge actually
connect
, and how much of it depends on normalizing protonation state?

Reproducibility. The engine is stdlib-only (no third-party
dependencies — uv run needs no installs) and every input is fetched live
from public APIs (OLS4, Rhea REST, GO rhea2go/go-basic.obo, UniProt REST,
KEGG REST, MetaboLights) and cached under .cache/ (gitignored). No results
are hardcoded
; the committed .md reports are regenerated by the scripts
below. With no network the scripts fail loudly rather than fabricate. To
cold-reproduce, delete .cache/ and re-run the commands in "Run it".

The finding (see RESULTS.md)

On a 26-metabolite stand-in for a central-carbon / amino-acid / nucleotide /
cofactor metabolomics readout, reported by neutral name as a repository would:

Rhea writes participants in their major protonation state at pH 7.3
(citrate(3-), ATP(4-), succinyl-CoA(5-)…); repositories report the neutral
species. Without normalization the bridge is essentially empty; with it, almost
everything connects — and lands on real GO molecular functions (ATP → 492 GO MF
terms, NAD+ → 447, acetyl-CoA → 385).

Real study: MetaboLights MTBLS1 (see studies/MTBLS1-RESULTS.md)

Run on the 64 curator-assigned ChEBI metabolites of MTBLS1 (Salek et al.,
type-2-diabetes urine NMR), pulled live from the study's MAF:

Two ChEBI normalization tiers close the gap to Rhea: protonation (charge
states) and structure/skeleton (generic↔stereospecific, via InChIKey
skeleton). On MTBLS1 they take coverage 8 → 49 → 58 / 64; the skeleton tier is
what recovers the diabetes BCAAs.

Enrichment + baseline

With the metabolites connected, the bridge supports real enrichment three ways
on the same study and the same hypergeometric test (BH-FDR):

KEGG gives pathway-membership buckets; GO resolves the specific molecular
activities and biological processes — complementary readouts on identical input.

Run it

uv run python coverage_probe.py                 # built-in demo metabolite set
uv run python coverage_probe.py --write-results # also regenerate RESULTS.md

# Real MetaboLights study, end to end:
uv run python fetch_metabolights.py MTBLS1      # -> studies/MTBLS1.chebi.txt
uv run python coverage_probe.py --chebi-file studies/MTBLS1.chebi.txt \
    --out studies/MTBLS1-RESULTS.md --title "MTBLS1 → GO bridge coverage" \
    --source "MetaboLights MTBLS1"
uv run python go_enrichment.py --chebi-file studies/MTBLS1.chebi.txt \
    --out studies/MTBLS1-GO-ENRICHMENT.md --source "MetaboLights MTBLS1"
uv run python go_bp_enrichment.py --chebi-file studies/MTBLS1.chebi.txt \
    --out studies/MTBLS1-GO-BP-ENRICHMENT.md --source "MetaboLights MTBLS1"
uv run python kegg_baseline.py --chebi-file studies/MTBLS1.chebi.txt \
    --out studies/MTBLS1-KEGG-BASELINE.md --source "MetaboLights MTBLS1"

Files

Everything is computed live; nothing is hardcoded. With no network the scripts
fail loudly rather than fabricate numbers.