Metabolomics → GO Bridge: protonation-normalization coverage probe

Metabolomics → GO Bridge: protonation-normalization coverage probe

Generated by coverage_probe.py — all numbers
computed live from OLS4 (ChEBI), the Rhea REST API, and the GO rhea2go mapping.

Why this probe exists

Reported metabolite ChEBI ids rarely match Rhea participants directly,
for two reasons we test as successive normalization tiers:

  1. Protonation — Rhea writes participants in their major protonation
    state at pH 7.3 (citrate(3-), ATP(4-)); repositories report neutral
    forms. We expand over ChEBI is_protonated_form_of /
    is_deprotonated_form_of (the protonation family).
  2. Structure / skeleton — a study reports a generic, non-stereospecific
    compound (isoleucine) while Rhea uses the stereospecific zwitterion
    (L-isoleucine zwitterion). We expand over the broader structural
    relations (+ tautomer/enantiomer + generic→specific children), bounded
    to the seed's InChIKey skeleton. This tier is stereo/charge-blind, so
    it is reported separately as the more permissive fallback.

Headline

Per-metabolite

Tier reached: exact < proton (protonation) < struct (skeleton) < (miss).

Metabolite Seed ChEBI q Tier GO MF (exact→proton→struct) Rhea-matched form
citric acid CHEBI:30769 0 proton 0→8→8 citrate(3-) (CHEBI:16947, q=-3)
cis-aconitic acid CHEBI:32805 0 proton 0→2→2 cis-aconitate(3-) (CHEBI:16383, q=-3)
isocitric acid CHEBI:30887 0 struct 0→0→6 D-erythro-isocitrate(3-) (CHEBI:15563, q=-3)
2-oxoglutaric acid CHEBI:30915 0 proton 0→152→152 2-oxoglutarate(2-) (CHEBI:16810, q=-2)
succinic acid CHEBI:15741 0 proton 0→92→92 succinate(2-) (CHEBI:30031, q=-2)
fumaric acid CHEBI:18012 0 proton 0→18→18 fumarate(2-) (CHEBI:29806, q=-2)
(S)-malic acid CHEBI:30797 0 proton 0→19→19 (S)-malate(2-) (CHEBI:15589, q=-2)
oxaloacetic acid CHEBI:30744 0 proton 0→129→129 oxaloacetate(2-) (CHEBI:16452, q=-2)
pyruvic acid CHEBI:32816 0 proton 0→135→135 pyruvate (CHEBI:15361, q=-1)
L-lactic acid CHEBI:53408 0 0→0→0
D-glucose CHEBI:17634 0 struct 0→0→0 aldehydo-D-glucose (CHEBI:42758, q=0)
D-glucose 6-phosphate CHEBI:14314 0 struct 0→0→1 aldehydo-D-glucose 6-phosphate(2-) (CHEBI:57584, q=-2)
D-fructose 6-phosphate CHEBI:15946 0 proton 0→2→2 D-fructose 6-phosphate(2-) (CHEBI:57579, q=-2)
phosphoenolpyruvic acid CHEBI:44897 0 proton 0→23→23 phosphonatoenolpyruvate (CHEBI:58702, q=-3)
3-phospho-D-glyceric acid CHEBI:17794 0 proton 0→25→25 3-phosphonato-D-glycerate(3-) (CHEBI:58272, q=-3)
L-glutamic acid CHEBI:16015 0 proton 0→132→132 L-glutamate(1-) (CHEBI:29985, q=-1)
L-aspartic acid CHEBI:17053 0 proton 0→40→40 L-aspartate(1-) (CHEBI:29991, q=-1)
L-alanine CHEBI:16977 0 proton 0→51→51 L-alanine zwitterion (CHEBI:57972, q=0)
glycine CHEBI:15428 0 proton 0→53→53 glycine zwitterion (CHEBI:57305, q=0)
L-serine CHEBI:17115 0 proton 0→32→32 L-serine zwitterion (CHEBI:33384, q=0)
ATP CHEBI:15422 0 proton 0→492→492 ATP(4-) (CHEBI:30616, q=-4)
ADP CHEBI:16761 0 proton 0→370→370 ADP(3-) (CHEBI:456216, q=-3)
AMP CHEBI:18374 0 proton 0→2→2 1-(5-phosphonato-beta-D-ribosyl)-5'-AMP(4-) (CHEBI:59457, q=-4)
NAD(+) CHEBI:15846 1 proton 0→447→447 NAD(1-) (CHEBI:57540, q=-1)
acetyl-CoA CHEBI:15351 0 proton 0→385→385 acetyl-CoA(4-) (CHEBI:57288, q=-4)
succinyl-CoA CHEBI:15380 0 proton 0→304→304 succinyl-CoA(5-) (CHEBI:57292, q=-5)

Recovered by structure (skeleton) normalization

Generic / stereochemistry mismatches the protonation tier could not fix,
recovered by InChIKey-skeleton expansion:

Example GO molecular functions reached

Residual misses (after both normalization tiers)

Resolved to ChEBI but matched no Rhea reaction even after protonation and
skeleton normalization — typically derivatives Rhea represents only in a
conjugated/acylated form, or compounds genuinely absent from Rhea.

Method / reproducibility