BIOREASON DeepECTF Prediction Evaluation

7
Proteins
7
Predictions
2
UNC
1
PLI
3
NPI
1
REP
0.3
Mean CS
UNC (2)
PLI (1)
NPI (3)
REP (1)
COR — Correct novel
CNN — Correct, not novel
LSP — Less precise
UNC — Uncertain
PLI — Paralog incorrect
NPI — Nonparalog incorrect
REP — Frequency bias
Protein Organism Predicted Term Type Assessment CS Error Summary
P26266
fepE
Escherichia coli K-12 EC:2.7.13.3
histidine kinase
EC REP 0 FREQUENCY_BIAS
FepE is a Wzz-family polysaccharide co-polymerase (Pfam PF02706; COG3765; crystal structure determined) anchored in the inner membrane by two TM helices with a large periplasmic domain and only an ~18-residue cytoplasmic tail. It has none of the conserved modules of a two-component histidine kinase (DHp/HisKA + HATPase_c/GHKL), no Walker/ATP-binding motif, and no reported phosphotransfer activity; its established function is non-catalytic O-antigen chain-length regulation. EC 2.7.13.3 is therefore structurally impossible for this protein. Because histidine kinase is among the most abundant enzyme classes in bacterial genomes and a known default label for uncharacterized membrane proteins, the prediction is best categorized as a frequency-biased repetition rather than a specific paralog/homology error. Assessed independently from the UniProt record, the crystal-structure literature (Kalynych 2012) and the Wzz functional literature (Murray 2003).
P0AFR4
yciO
Escherichia coli K-12 EC:2.7.7.87
L-threonylcarbamoyladenylate synthase
EC PLI 0 PARALOG_OVERANNOTATION
YciO belongs to the Sua5/TsaC(YrdC) family and the model assigned it the family's signature reaction (TC-AMP synthesis, the first committed step of t6A). That dedicated activity belongs to the TsaC/YrdC subfamily, not to its paralog YciO. Independent evidence: (1) a pairwise alignment shows YciO keeps the YrdC fold (~29% identity) but the conserved catalytic Asn of the Sua5/TsaC S-x-N motif is replaced (TsaC STSANL -> YciO STSLML), i.e. the active site is degenerate; (2) t6A is essential and tsaC/yrdC is essential despite YciO being present, so YciO cannot supply TC-AMP-synthase function in vivo; (3) the in vitro activity reported by Kim et al. 2023 (0.0705 U/mg) is weak, and UniProt records that YciO also hydrolyses ATP to AMP, consistent with promiscuous, non-productive turnover. The prediction is therefore a paralog over-annotation; YciO's true in vivo function remains unresolved. Assessed independently from the UniProt record, the DeepECTF paper's own numbers, the YciO/TsaC structures, and an in-house sequence/active-site comparison; the de Crecy-Lagard evaluation literature was not used.
P76419
yegV
Escherichia coli K-12 EC:2.7.1.92
dehydro-2-deoxygluconokinase
EC UNC 1
The PfkB/ribokinase-like family assignment makes "an ATP-dependent sugar kinase" (EC 2.7.1.-) a sound general call, and the first three EC digits are consistent with the domain. However, the specific substrate (EC 2.7.1.92, 5-dehydro-2-deoxy-D-gluconate) is a fine-grained discrimination among many PfkB paralogs that the allowed evidence cannot support: YegV is entirely uncharacterized (no substrate, products, kinetics, or complex structure), so the prediction can be neither validated nor refuted. Two contextual points make the specific assignment unlikely rather than impossible: (i) the predicted substrate belongs to myo-inositol catabolism, a pathway absent from E. coli K-12 (no iol genes); and (ii) YegV's operon (yegTUV) is repressed by the ADP-glucose-sensing regulator GgaR and is linked to glycogen, implying a different storage-carbon substrate. Because YegV's true substrate is unknown and PfkB kinases can be promiscuous, the most defensible call is uncertain. Assessed independently from the UniProt family annotation, the yegTUV operon literature, and EC/pathway context; the de Crecy-Lagard evaluation literature was not used.
P52037
ygfF
Escherichia coli K-12 EC:1.1.1.47
glucose 1-dehydrogenase
EC UNC 1
Unlike a weak/promiscuous hit, YgfF's predicted activity is robustly demonstrated in vitro (305.55 U/mg, comparable to dedicated bacterial glucose dehydrogenases), and YgfF has an intact SDR catalytic apparatus (Tyr156, NAD-binding motif), so the biochemical prediction is correct at the activity level. However, the allowed evidence cannot establish that glucose 1-dehydrogenase is YgfF's biological function: (i) E. coli K-12 has no cytoplasmic NAD-glucose-dehydrogenase pathway (glucose enters via the PTS); (ii) SDR enzymes are notoriously promiscuous, so a high in vitro rate on glucose does not prove glucose is the physiological substrate; and (iii) Kim et al. note YgfF's nearest training homolog is a 3-oxoacyl-ACP reductase (EC 1.1.1.100, FabG-like), suggesting a ketoreductase rather than sugar-oxidase role. Because the activity is real but its in vivo relevance is unresolved (and the true substrate unknown), the prediction is judged uncertain rather than confidently correct or incorrect. This contrasts with YciO, where the activity is weak and the active site degenerate (a clearer over-annotation). Assessed independently from the UniProt record, the DeepECTF paper's own data, and SDR/EC pathway reasoning; the de Crecy-Lagard evaluation literature was not used.
P0AFJ1
yjdM
Escherichia coli K-12 EC:3.11.1.2
phosphonoacetate hydrolase
EC NPI 0 IN_VITRO_NOT_IN_VIVO
Three independent lines of evidence refute phosphonoacetate hydrolase as YjdM's function. (1) Fold/homology: YjdM is a small zinc-ribbon protein (YjdM family, Zn_Ribbon_YjdM domain; CxxC knuckles at C6/C9 and C23/C26) with no homology to the alkaline-phosphatase-superfamily phosphonoacetate hydrolases; a zinc ribbon is not a C-P bond hydrolase, and the DeepECTF paper itself notes YjdM's nearest training homolog is a DNA-directed RNA polymerase (EC 2.7.7.6), a zinc-ribbon-containing protein. (2) Genetics: phnA (yjdM) was shown to have no role in phosphonate metabolism, with all required genes in the phnC-phnP operon, and E. coli K-12 uses the C-P lyase rather than a phosphonoacetate hydrolase. (3) Naming: the historical "phnA" name of yjdM collides with the unrelated Pseudomonas phosphonoacetate hydrolase PhnA, a likely source of the mis-assignment. The reported in vitro activity (139.85 U/mg by a colorimetric phosphate-release assay) therefore does not represent an in vivo biological function. Assessed independently from the UniProt record, the DeepECTF paper's own data, and the classic E. coli phosphonate-genetics literature; the de Crecy-Lagard evaluation literature was not used.
P39368
yjhQ
Escherichia coli K-12 EC:2.3.1.189
mycothiol synthase
EC NPI 0 PATHWAY_CONTEXT_IGNORED
YjhQ is a GNAT-family N-acetyltransferase (Pfam Acetyltransf_1; GNAT domain; EC 2.3.1.-), so an N-acetyltransferase call is defensible, but the specific product (mycothiol synthase, MshD, EC 2.3.1.189) is biologically impossible in this organism: mycothiol is the characteristic low-molecular-weight thiol of Actinobacteria, and its biosynthetic pathway (MshA glycosyltransferase, MshB deacetylase, MshC ligase, MshD acetyltransferase) is entirely absent from E. coli, which uses glutathione instead. With no desacetylmycothiol substrate produced in E. coli, the predicted reaction cannot occur in vivo. The model over-propagated the GNAT acetyltransferase signal to a specific actinobacterial substrate while ignoring pathway context. Moreover, YjhQ's actual characterized function is as the antitoxin of the YjhX(TopAI)-YjhQ toxin-antitoxin system, unrelated to thiol metabolism. Assessed independently from the UniProt/GNAT annotation, the TA-system literature, and comparative-pathway reasoning (mycothiol absent from E. coli); the de Crecy-Lagard evaluation literature was not used.
P46857
yrhB
Escherichia coli K-12 EC:4.1.2.50
6-carboxytetrahydropterin synthase
EC NPI 0 PATHWAY_CONTEXT_IGNORED
YrhB has no structural or evolutionary basis for this activity. Its only recognizable feature is an Imm35 (bacteriocin-immunity) domain (Pfam PF15567); it is not homologous to 6-carboxytetrahydropterin synthase and lacks the 6-pyruvoyltetrahydropterin-synthase (PTPS) tunnel fold that defines this enzyme. In E. coli the 6-carboxytetrahydropterin synthase step of queuosine biosynthesis is carried out by the dedicated, non-homologous enzyme QueD (ygcM, b2765), so the activity is already accounted for and would not be expected from an unrelated 94-residue protein. The model ignored this metabolic context and assigned a redundant enzymatic function to a protein that has no pterin-synthase fold; YrhB itself remains functionally uncharacterized. Assessed independently from the UniProt record/Imm35 domain, the queuosine pathway gene assignments in E. coli, and fold/family reasoning; the de Crecy-Lagard evaluation literature was not used.