ygfF

UniProt ID: P52037
Organism: Escherichia coli (strain K12)
Review Status: COMPLETE
📝 Provide Detailed Feedback

Gene Description

YgfF is a glucose 1-dehydrogenase (EC 1.1.1.47) belonging to the short-chain dehydrogenases/reductases (SDR) family (SDR63C subgroup). It catalyzes the NAD(+)-dependent oxidation of D-glucose to D-glucono-1,5-lactone, which spontaneously hydrolyzes to D-gluconate. The enzymatic function was predicted by the DeepECtransformer deep learning tool (prediction score 0.6331) and experimentally validated in vitro by Kim et al. 2023, who measured a specific activity of 305.55 U/mg, comparable to characterized glucose 1-dehydrogenases from other organisms. Full kinetic parameters (Km, kcat) have not yet been determined. YgfF was one of only three genuinely correct novel predictions (out of 464) made by DeepECtransformer for the E. coli y-ome. The protein contains a conserved NAD(P)-binding Rossmann-fold domain and predicted binding sites for both NAD(+) and D-glucose. A physical interaction with LpdA (dihydrolipoyl dehydrogenase) was detected by affinity purification-mass spectrometry, though the biological significance of this interaction is unclear. The physiological role of YgfF in E. coli metabolism remains to be established.

Existing Annotations Review

GO Term Evidence Action Reason
GO:0016491 oxidoreductase activity
IBA
GO_REF:0000033
ACCEPT
Summary: YgfF is experimentally confirmed as a glucose 1-dehydrogenase (EC 1.1.1.47), which is a type of oxidoreductase (PMID:37963869). The IBA annotation to oxidoreductase activity is correct but much less specific than what is now known. The more specific term GO:0047934 (glucose 1-dehydrogenase (NAD+) activity) is supported by direct experimental evidence.
Reason: This IBA annotation is consistent with the experimentally validated function of YgfF. While more specific terms exist (and are annotated separately), the IBA at this level is not wrong and reflects phylogenetic inference that is consistent with experimental data. Glucose 1-dehydrogenase activity is a subtype of oxidoreductase activity.
Supporting Evidence:
PMID:37963869
YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
GO:0016614 oxidoreductase activity, acting on CH-OH group of donors
IEA
GO_REF:0000117
ACCEPT
Summary: This IEA annotation from ARBA is consistent with the experimentally validated glucose 1-dehydrogenase activity. Glucose 1-dehydrogenase acts on the CH-OH group of D-glucose, so this intermediate-level annotation is correct. It is less specific than GO:0047934 but appropriately reflects the ARBA computational prediction.
Reason: The term is a correct parent of the experimentally validated specific function (glucose 1-dehydrogenase (NAD+) activity). The IEA evidence code is appropriate for a computationally derived annotation, and the term is consistent with the SDR family classification and the known catalytic mechanism.
Supporting Evidence:
PMID:37963869
For YgfF, DeepECtransformer predicted its EC number to be EC:1.1.1.47 (glucose 1-dehydrogenase).
GO:0047934 glucose 1-dehydrogenase (NAD+) activity
IEA
GO_REF:0000116
ACCEPT
Summary: This IEA annotation is derived from Rhea reaction mapping (RHEA:14293), which corresponds to the reaction D-glucose + NAD(+) = D-glucono-1,5-lactone + NADH + H(+). This matches the experimentally validated catalytic activity of YgfF exactly as described by Kim et al. 2023 and annotated by UniProt (PMID:37963869).
Reason: The Rhea-derived IEA correctly captures the specific enzymatic activity of YgfF. This is also independently supported by the IDA annotation from PMID:37963869 using the same GO term.
Supporting Evidence:
PMID:37963869
YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
GO:0047936 glucose 1-dehydrogenase [NAD(P)+] activity
IEA
GO_REF:0000003
MODIFY
Summary: This IEA annotation is derived from the EC number mapping (EC:1.1.1.47). GO:0047936 describes glucose 1-dehydrogenase that can use either NAD+ or NADP+ as cofactor. However, the EC number 1.1.1.47 specifically refers to the NAD+-dependent form, and the experimental validation by Kim et al. used an NAD+-dependent assay (PMID:37963869). UniProt annotates the catalytic activity with the Rhea reaction that specifies NAD+ specifically. The more precise term GO:0047934 (NAD+ specific) is the better annotation.
Reason: The EC:1.1.1.47 mapping to GO:0047936 is potentially an overly broad mapping, since EC:1.1.1.47 is the NAD+-dependent glucose 1-dehydrogenase, not the dual-cofactor NAD(P)+ form (which would be EC:1.1.1.119). The experimental data from Kim et al. 2023 validated activity using an NAD+-dependent assay kit, and UniProt annotates the Rhea reaction (RHEA:14293) specifically with NAD+. The NAD+-specific GO term GO:0047934 is more accurate.
Supporting Evidence:
PMID:37963869
YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
GO:0005515 protein binding
IPI
PMID:15690043
Interaction network containing conserved and essential prote...
MARK AS OVER ANNOTATED
Summary: This annotation is based on a high-throughput affinity purification-mass spectrometry study by Butland et al. 2005 that detected a physical interaction between YgfF and LpdA (dihydrolipoyl dehydrogenase, P0A9P0). The interaction is recorded in IntAct with 2 experiments supporting it. However, the GO term GO:0005515 (protein binding) is uninformative per curation guidelines and does not convey any specific functional information about this interaction.
Reason: Per curation guidelines, GO:0005515 (protein binding) is too vague and uninformative. The Butland et al. 2005 study was a large-scale screen that detected many interactions, and the biological significance of the YgfF-LpdA interaction is unknown. LpdA functions in the pyruvate dehydrogenase and 2-oxoglutarate dehydrogenase complexes, and there is no clear functional connection to glucose 1-dehydrogenase activity. Without understanding the functional nature of this interaction, a generic protein binding annotation provides little value.
Supporting Evidence:
PMID:15690043
no large-scale analysis of protein complexes in Escherichia coli has yet been reported. To this end, we have targeted DNA cassettes into the E. coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open reading frames
GO:0047934 glucose 1-dehydrogenase (NAD+) activity
IDA
PMID:37963869
Functional annotation of enzyme-encoding genes using deep le...
ACCEPT
Summary: This is the key experimentally validated annotation. Kim et al. 2023 expressed and purified recombinant His-tagged YgfF from E. coli BL21(DE3) and measured glucose 1-dehydrogenase activity in vitro using a colorimetric GDH assay kit. The specific activity was 305.55 U/mg, comparable to the previously reported value of 205.70 U/mg for glucose 1-dehydrogenase from Lysinibacillus sphaericus. The activity was predicted by DeepECtransformer with EC number EC:1.1.1.47 and confirmed by the enzyme assay. This represents a validated core function.
Reason: Direct experimental evidence from in vitro enzyme assay demonstrates glucose 1-dehydrogenase (NAD+) activity. The specific activity (305.55 U/mg) is robust and comparable to characterized homologs. This is the most specific and well-supported annotation for YgfF. UniProt has adopted this function based on this study (EC:1.1.1.47).
Supporting Evidence:
PMID:37963869
For YgfF, DeepECtransformer predicted its EC number to be EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
PMID:37963869
which was comparable to the previously reported value of 205.70 U mg−1 for the glucose 1-dehydrogenase from Lysinibacillus sphaericus G10
file:ECOLI/ygfF/ygfF-deep-research-falcon.md
Falcon deep research confirms YgfF as EC 1.1.1.47 glucose 1-dehydrogenase validated by in vitro assay with 305.55 U/mg specific activity, and notes SDR63C subgroup classification supports this assignment.
GO:0051287 NAD binding
IDA
PMID:37963869
Functional annotation of enzyme-encoding genes using deep le...
NEW
Summary: YgfF requires NAD+ as a cofactor for its glucose 1-dehydrogenase activity. The catalytic reaction (D-glucose + NAD(+) = D-glucono-1,5-lactone + NADH + H(+)) directly involves NAD+ binding. UniProt annotates extensive NAD+ binding residues (positions 11, 13, 59, 60, 86, 88, 110, 156, 160, 189, 191, 194) based on similarity to characterized SDR family members. The in vitro enzyme assay demonstrating NAD+-dependent glucose oxidation provides experimental evidence for NAD binding.
Reason: The experimentally validated glucose 1-dehydrogenase activity requires NAD+ as a cofactor, and UniProt annotates multiple NAD+ binding residues. NAD binding is an inherent aspect of the catalytic mechanism and should be annotated. This is not currently present in the GO annotations.
Supporting Evidence:
PMID:37963869
For YgfF, DeepECtransformer predicted its EC number to be EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
GO:0019521 D-gluconate metabolic process
IDA
PMID:37963869
Functional annotation of enzyme-encoding genes using deep le...
NEW
Summary: YgfF catalyzes the oxidation of D-glucose to D-glucono-1,5-lactone, which spontaneously hydrolyzes to D-gluconate. This places YgfF as a participant in D-gluconate metabolism. No biological process annotations currently exist for YgfF, yet the experimentally validated enzymatic activity directly implicates it in this metabolic pathway. However, the in vivo physiological role has not been established, so this annotation should be considered with caution.
Reason: There are currently no biological process annotations for YgfF, which is a significant gap. The experimentally validated glucose 1-dehydrogenase activity produces D-glucono-1,5-lactone (a precursor to D-gluconate), directly linking YgfF to D-gluconate metabolic process. UniProt states the protein catalyzes the NAD(+)-dependent oxidation of D-glucose to D-gluconate via gluconolactone.
Supporting Evidence:
PMID:37963869
YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
GO:0005829 cytosol
IDA
PMID:37963869
Functional annotation of enzyme-encoding genes using deep le...
NEW
Summary: YgfF was expressed as a soluble protein and purified from the cytosolic fraction (supernatant after cell lysis and centrifugation) by Kim et al. 2023. The protein was predicted to be soluble by NetSolP and was successfully purified from the soluble fraction. While there are no dedicated localization studies, the solubility data and lack of any signal peptide or transmembrane domain strongly suggest cytosolic localization. No cellular component annotations currently exist for YgfF.
Reason: There are currently no cellular component annotations for YgfF, which is a gap. The protein was purified from the soluble cytoplasmic fraction and has no predicted signal peptide or transmembrane domains, consistent with cytosolic localization. However, the evidence is indirect (protein was soluble when overexpressed) rather than from a dedicated localization study.
Supporting Evidence:
PMID:37963869
179 proteins are predicted to be soluble in E. coli by NetSolP, a deep learning model for protein solubility prediction
PMID:37963869
Cell debris was separated by centrifugation at 15,044 × g for 40 min, and the resulting supernatants were loaded onto Talon metal affinity resin

Core Functions

NAD(+)-dependent glucose 1-dehydrogenase activity. YgfF catalyzes the oxidation of D-glucose to D-glucono-1,5-lactone using NAD+ as the electron acceptor. This is the sole experimentally validated enzymatic function, demonstrated by in vitro assay with a specific activity of 305.55 U/mg (PMID:37963869). YgfF belongs to the SDR family (SDR63C subgroup) and contains a conserved NAD(P)-binding Rossmann-fold domain.

Supporting Evidence:
  • PMID:37963869
    For YgfF, DeepECtransformer predicted its EC number to be EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1

References

Gene Ontology annotation based on Enzyme Commission mapping
Annotation inferences using phylogenetic trees
Automatic Gene Ontology annotation based on Rhea mapping
Electronic Gene Ontology annotations created by ARBA machine learning models
Interaction network containing conserved and essential protein complexes in Escherichia coli.
  • High-throughput affinity purification-mass spectrometry detected a physical interaction between YgfF (P52037) and LpdA (P0A9P0, dihydrolipoyl dehydrogenase).
    "no large-scale analysis of protein complexes in Escherichia coli has yet been reported. To this end, we have targeted DNA cassettes into the E. coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open reading frames"
Functional annotation of enzyme-encoding genes using deep learning with transformer layers.
  • DeepECtransformer predicted YgfF to have EC number EC:1.1.1.47 (glucose 1-dehydrogenase) and this was experimentally validated by in vitro enzyme assay with a specific activity of 305.55 U/mg.
    "For YgfF, DeepECtransformer predicted its EC number to be EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1"
  • YgfF was one of three randomly selected y-ome proteins whose predicted enzymatic functions were experimentally validated, demonstrating the utility of DeepECtransformer for functional annotation.
    "we randomly selected three proteins, YgfF, YciO, and YdjM, that are predicted to be oxidoreductase, transferase, and hydrolase, respectively"
  • The neural network predicted YgfF's function despite higher sequence identity to a different enzyme (EC:1.1.1.100), showing the model learned functional motifs rather than relying solely on homology.
    "It should be noted that although YgfF exhibited a higher sequence identity with a different enzyme (A0A069CGU9_ECOLX; EC:1.1.1.100) from the training dataset than glucose 1-dehydrogenase exhibiting the maximum sequence identity within the training dataset, the neural network made an accurate prediction"
DOI:10.3390/life14030364
Back to the future of metabolism - advances in the discovery and characterization of unknown biocatalytic functions and pathways.
  • YgfF is highlighted as an example of ML-assisted functional annotation where DeepECtransformer predicted glucose 1-dehydrogenase activity and in vitro enzyme assays were performed on overexpressed and affinity-purified protein. The review emphasizes that deeper characterization (kinetic parameters, substrate spectrum) is needed to fully establish metabolic roles.
    "The review stresses that function assignment for unusual/unknown enzymes often requires extensive experimental work including expression, purification, substrate synthesis, analytical methods, and kinetic characterization such as kcat and KM."
Limitations of current machine-learning models in predicting enzymatic functions for uncharacterized proteins.
  • YgfF is classified into the SDR63C / glucose 1-dehydrogenase subgroup by HMM-based SDR subfamily analysis, consistent with the DeepECtransformer prediction. However, in vitro activity alone is insufficient to establish physiological (in vivo) function, and best practice requires combining biochemical and genetic evidence.
    "This resource predicts YgfF is part of the SDR63C/Glucose 1-dehydrogenase subgroup, the activity predicted and validated in the Kim et al. (2023) study. This prediction demonstrates the accurate propagation of functional annotation and is a successful prediction."

Suggested Questions for Experts

Q: What is the physiological role of YgfF glucose 1-dehydrogenase activity in E. coli K-12 metabolism? Is it involved in glucose catabolism via the Entner-Doudoroff pathway or another metabolic route?

Suggested experts: Lee SY, Kim GB

Q: What is the biological significance of the YgfF-LpdA physical interaction detected by Butland et al. 2005? Does YgfF participate in a metabolic complex with pyruvate dehydrogenase components?

Suggested experts: Emili A, Butland G

Q: Does YgfF have any activity with NADP+ as cofactor, or is it strictly NAD+-dependent? The current annotations include both NAD+ and NAD(P)+ terms.

Suggested experts: Kim GB, Lee SY

Suggested Experiments

Experiment: Construct a ygfF knockout in E. coli K-12 and test growth phenotypes on minimal media with glucose as the sole carbon source under aerobic and anaerobic conditions. Compare with wild-type to determine if YgfF contributes to glucose utilization in vivo.

Hypothesis: YgfF deletion affects growth on glucose as sole carbon source under specific metabolic conditions.

Type: growth phenotype assay

Experiment: Perform in vitro enzyme assays with purified YgfF using NADP+ instead of NAD+ as the cofactor to determine cofactor specificity. This would resolve whether GO:0047936 (NAD(P)+ form) or GO:0047934 (NAD+ specific) is the correct annotation.

Hypothesis: YgfF is NAD+-specific and does not use NADP+ as an electron acceptor.

Type: enzyme kinetics

Experiment: Perform co-purification experiments with tagged YgfF under physiological expression levels and test whether LpdA affects YgfF enzymatic activity in vitro. Also test if ygfF deletion affects pyruvate dehydrogenase complex activity.

Hypothesis: The YgfF-LpdA interaction has functional significance in glucose metabolism.

Type: protein interaction validation

Computational Predictions

YgfF DeepECTF prediction review. The DeepECTF prediction of glucose 1-dehydrogenase (EC 1.1.1.47) is a successful prediction, validated by SDR nomenclature classification (SDR63C subgroup) and consistent with published biochemical data.

EC:1.1.1.47 glucose 1-dehydrogenase EC
COR CS: 2
Source: DeepECTF v2023 · PMID:37820725
Summary: Correct prediction. YgfF belongs to the SDR63C/Glucose 1-dehydrogenase subgroup of the SDR superfamily (IPR002347), as classified by the Oppermann/Persson HMM-based nomenclature system. The DeepECTF prediction matches this classification and was validated by in vitro assay. However, the actual in vivo substrate is NAD+ not NADP+, and the physiological substrate may be D-gluconate rather than glucose.
Supporting Evidence:
  • PMID:40703034: "YgfF is a member of the large short-chain dehydrogenase/reductase (SDR) superfamily (IPR002347)...This resource predicts YgfF is part of the SDR63C/Glucose 1-dehydrogenase subgroup, the activity predicted and validated in the Kim et al. (2023) study. This prediction demonstrates the accurate propagation of functional annotation and is a successful prediction."

📚 Additional Documentation

Deep Research Falcon

(ygfF-deep-research-falcon.md)

provider: falcon
model: Edison Scientific Literature
cached: false
start_time: '2026-03-22T17:39:29.798295'
end_time: '2026-03-22T17:46:45.635027'
duration_seconds: 435.84
template_file: templates/gene_research_go_focused.md
template_variables:
organism: ECOLI
gene_id: ygfF
gene_symbol: ygfF
uniprot_accession: P52037
protein_description: 'RecName: Full=Glucose 1-dehydrogenase YgfF {ECO:0000303|PubMed:37963869};
EC=1.1.1.47 {ECO:0000269|PubMed:37963869};'
gene_info: Name=ygfF; Synonyms=yqfD; OrderedLocusNames=b2902, JW2870;
organism_full: Escherichia coli (strain K12).
protein_family: Belongs to the short-chain dehydrogenases/reductases (SDR)
protein_domains: NAD(P)-bd_dom_sf. (IPR036291); Sc_DH/Rdtase_CS. (IPR020904); SDR_fam.
(IPR002347); adh_short_C2 (PF13561)
provider_config:
timeout: 600
max_retries: 3
parameters:
allowed_domains: []
temperature: 0.1
citation_count: 13


Question

Gene Research for Functional Annotation

⚠️ CRITICAL: Gene/Protein Identification Context

BEFORE YOU BEGIN RESEARCH: You MUST verify you are researching the CORRECT gene/protein. Gene symbols can be ambiguous, especially for less well-characterized genes from non-model organisms.

Target Gene/Protein Identity (from UniProt):

  • UniProt Accession: P52037
  • Protein Description: RecName: Full=Glucose 1-dehydrogenase YgfF {ECO:0000303|PubMed:37963869}; EC=1.1.1.47 {ECO:0000269|PubMed:37963869};
  • Gene Information: Name=ygfF; Synonyms=yqfD; OrderedLocusNames=b2902, JW2870;
  • Organism (full): Escherichia coli (strain K12).
  • Protein Family: Belongs to the short-chain dehydrogenases/reductases (SDR)
  • Key Domains: NAD(P)-bd_dom_sf. (IPR036291); Sc_DH/Rdtase_CS. (IPR020904); SDR_fam. (IPR002347); adh_short_C2 (PF13561)

MANDATORY VERIFICATION STEPS:

  1. Check if the gene symbol "ygfF" matches the protein description above
  2. Verify the organism is correct: Escherichia coli (strain K12).
  3. Check if protein family/domains align with what you find in literature
  4. If you find literature for a DIFFERENT gene with the same or similar symbol, STOP

If Gene Symbol is Ambiguous or You Cannot Find Relevant Literature:

DO NOT PROCEED WITH RESEARCH ON A DIFFERENT GENE. Instead:
- State clearly: "The gene symbol 'ygfF' is ambiguous or literature is limited for this specific protein"
- Explain what you found (e.g., "Found extensive literature on a different gene with the same symbol in a different organism")
- Describe the protein based ONLY on the UniProt information provided above
- Suggest that the protein function can be inferred from domain/family information

Research Target:

Please provide a comprehensive research report on the gene ygfF (gene ID: ygfF, UniProt: P52037) in ECOLI.

The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.

You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.

We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.

We are interested in where in or outside the cell the gene product carries out its function.

We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.

Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.

Output

Question: You are an expert researcher providing comprehensive, well-cited information.

Provide detailed information focusing on:
1. Key concepts and definitions with current understanding
2. Recent developments and latest research (prioritize 2023-2024 sources)
3. Current applications and real-world implementations
4. Expert opinions and analysis from authoritative sources
5. Relevant statistics and data from recent studies

Format as a comprehensive research report with proper citations. Include URLs and publication dates where available.
Always prioritize recent, authoritative sources and provide specific citations for all major claims.

Gene Research for Functional Annotation

⚠️ CRITICAL: Gene/Protein Identification Context

BEFORE YOU BEGIN RESEARCH: You MUST verify you are researching the CORRECT gene/protein. Gene symbols can be ambiguous, especially for less well-characterized genes from non-model organisms.

Target Gene/Protein Identity (from UniProt):

  • UniProt Accession: P52037
  • Protein Description: RecName: Full=Glucose 1-dehydrogenase YgfF {ECO:0000303|PubMed:37963869}; EC=1.1.1.47 {ECO:0000269|PubMed:37963869};
  • Gene Information: Name=ygfF; Synonyms=yqfD; OrderedLocusNames=b2902, JW2870;
  • Organism (full): Escherichia coli (strain K12).
  • Protein Family: Belongs to the short-chain dehydrogenases/reductases (SDR)
  • Key Domains: NAD(P)-bd_dom_sf. (IPR036291); Sc_DH/Rdtase_CS. (IPR020904); SDR_fam. (IPR002347); adh_short_C2 (PF13561)

MANDATORY VERIFICATION STEPS:

  1. Check if the gene symbol "ygfF" matches the protein description above
  2. Verify the organism is correct: Escherichia coli (strain K12).
  3. Check if protein family/domains align with what you find in literature
  4. If you find literature for a DIFFERENT gene with the same or similar symbol, STOP

If Gene Symbol is Ambiguous or You Cannot Find Relevant Literature:

DO NOT PROCEED WITH RESEARCH ON A DIFFERENT GENE. Instead:
- State clearly: "The gene symbol 'ygfF' is ambiguous or literature is limited for this specific protein"
- Explain what you found (e.g., "Found extensive literature on a different gene with the same symbol in a different organism")
- Describe the protein based ONLY on the UniProt information provided above
- Suggest that the protein function can be inferred from domain/family information

Research Target:

Please provide a comprehensive research report on the gene ygfF (gene ID: ygfF, UniProt: P52037) in ECOLI.

The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.

You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.

We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.

We are interested in where in or outside the cell the gene product carries out its function.

We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.

Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.

Comprehensive Research Report: Functional Annotation of Escherichia coli K-12 ygfF (UniProt: P52037)

Scope and identity verification

This report concerns Escherichia coli (strain K-12/MG1655) gene ygfF (synonym yqfD; locus b2902 / JW2870) whose curated description is glucose 1-dehydrogenase YgfF and enzyme classification EC 1.1.1.47, belonging to the short-chain dehydrogenases/reductases (SDR) superfamily (as also reflected in SDR-family subgrouping discussed below). The key, direct experimental evidence in the retrieved literature comes from a 2023 primary study that explicitly labels YgfF as EC 1.1.1.47 and biochemically validates this activity in vitro (kim2023functionalannotationof pages 6-7).

1) Key concepts and definitions (current understanding)

1.1 The “y-ome” concept and why ygfF was historically difficult to annotate

In E. coli K-12 MG1655, a substantial fraction of genes remains incompletely characterized. Kim et al. (published Nov 2023) describe that E. coli still has roughly ~30% of genes incompletely characterized, and define the “y-ome” as genes lacking sufficient experimental evidence for functional characterization (kim2023functionalannotationof pages 5-6). They operationalize this by assembling a y-ome protein set and applying machine-learning (ML) function prediction followed by targeted biochemical validation (kim2023functionalannotationof pages 5-6).

1.2 SDR enzymes (short-chain dehydrogenases/reductases)

SDR enzymes are a large and diverse oxidoreductase superfamily. In the context of ygfF, an expert commentary evaluating ML annotation states that YgfF is a member of the SDR superfamily (InterPro family IPR002347) and can be classified into an SDR subgroup consistent with glucose 1-dehydrogenase function (crecylagard2025limitationsofcurrent pages 7-9). This family-level classification provides mechanistic plausibility for nicotinamide-dependent sugar oxidation/reduction, but does not by itself establish the physiological substrate or pathway role (crecylagard2025limitationsofcurrent pages 7-9).

2) Recent developments and latest research (prioritizing 2023–2024)

2.1 Primary functional assignment and experimental validation (Nature Communications, 2023)

A major recent development is the explicit experimental validation of YgfF’s enzymatic activity by Kim et al. (Nature Communications; publication month Nov 2023, DOI: 10.1038/s41467-023-43216-z, URL: https://doi.org/10.1038/s41467-023-43216-z). Their DeepECtransformer model predicted YgfF as EC:1.1.1.47 (glucose 1-dehydrogenase) with a neural-network score 0.6331, and they validated the activity biochemically (kim2023functionalannotationof pages 6-7).

Quantitative result: purified YgfF exhibited specific glucose 1-dehydrogenase activity of 305.55 U mg−1 in vitro (kim2023functionalannotationof pages 6-7). In the same discussion, the authors compare this activity to a previously reported glucose 1-dehydrogenase from Lysinibacillus sphaericus (reported 205.70 U mg−1), as a rough benchmarking of magnitude (kim2023functionalannotationof pages 6-7).

Cofactor indication (from figure evidence): the experimental validation figure includes a reaction scheme labeling NAD → NADH + H+, indicating the assay/interpretation uses NAD as the oxidizing cofactor for the glucose dehydrogenase reaction scheme (kim2023functionalannotationof media 81d7b013). The retrieved text excerpts do not provide an explicit cofactor preference comparison versus NADP (e.g., kinetic preference), but the figure provides direct visual support that NAD is the depicted cofactor for the validated reaction (kim2023functionalannotationof media 81d7b013).

Assay implementation details: Kim et al. report overexpression of His-tagged YgfF in E. coli BL21(DE3), purification by metal affinity resin, and use of a glucose dehydrogenase colorimetric kit with OD450 readout at 37 °C, with reaction mixtures containing assay buffer, developer, and glucose substrate (kim2023functionalannotationof pages 7-8). These details establish that the validation was performed on purified recombinant protein and confirm glucose was used as substrate under the assay conditions (kim2023functionalannotationof pages 7-8).

2.2 2024 review context: rediscovery and characterization of unknown biocatalytic functions

A 2024 review (Wohlgemuth, Life, Mar 2024, DOI: 10.3390/life14030364, URL: https://doi.org/10.3390/life14030364) highlights the DeepECtransformer study as an example of assigning enzyme functions to previously unannotated proteins, explicitly noting YgfF as a predicted glucose 1-dehydrogenase and that in vitro enzyme assays were performed on overexpressed and affinity-purified YgfF (wohlgemuth2024backtothe pages 3-6). The review stresses that function assignment for unusual/unknown enzymes often requires extensive experimental work (expression/purification, substrate synthesis, analytical methods, kinetic characterization such as kcat and KM), and emphasizes the need for protein- and time-dependent catalysis demonstrations (wohlgemuth2024backtothe pages 3-6). This is relevant because the current YgfF evidence base (in the retrieved sources) includes a specific activity but not full kinetic constants (wohlgemuth2024backtothe pages 3-6).

2.3 Expert analysis and critique: limits of ML predictions vs physiological function

A later expert analysis evaluating current ML model limitations notes that YgfF’s case is best understood as propagation of a known SDR subgroup function: YgfF is predicted (using SDR subgroup HMMs) to belong to SDR63C / glucose 1-dehydrogenase subgroup, consistent with Kim et al.’s prediction and in vitro validation (crecylagard2025limitationsofcurrent pages 7-9). Critically, this analysis argues that in vitro activity alone is insufficient to establish physiological (in vivo) function and that best practice is combining biochemical and genetic evidence (crecylagard2025limitationsofcurrent pages 7-9). This caveat directly impacts how confidently ygfF can be placed into an E. coli pathway based on currently retrieved evidence.

3) Current applications and real-world implementations

3.1 Practical uses in functional annotation pipelines

The immediate “real-world implementation” of the YgfF result is its role as a benchmark case for ML-assisted functional annotation of uncharacterized microbial genes. Kim et al. used E. coli K-12 MG1655 as a model genome and report that their approach predicted EC numbers for 464 y-ome proteins (with 390 receiving full four-digit EC predictions) and then experimentally validated a subset including YgfF (kim2023functionalannotationof pages 5-6). YgfF thus serves as an example of pairing computational prediction with rapid biochemical testing, a workflow increasingly used in genome annotation and metabolic model refinement efforts (kim2023functionalannotationof pages 5-6, wohlgemuth2024backtothe pages 3-6).

3.2 Biocatalysis relevance (inference limited by available evidence)

While glucose 1-dehydrogenases are widely used as redox biocatalysts or in sugar oxidation contexts, the retrieved evidence does not establish YgfF’s performance characteristics beyond specific activity under kit conditions, nor its substrate scope, stability, or engineering history. Therefore, any industrial/biotech deployment claims for E. coli YgfF specifically are not supported by the retrieved sources and are not asserted here.

4) Expert opinions and authoritative analysis

4.1 Caution in interpreting biochemical validation as physiological role

The expert commentary on ML annotation emphasizes a key interpretive point: YgfF’s glucose 1-dehydrogenase activity is biochemically supported, but physiological function in the native organism requires additional evidence, ideally including genetics (knockout/phenotyping, complementation) and pathway context (crecylagard2025limitationsofcurrent pages 7-9). The 2024 review similarly emphasizes the broader methodological standard of deeper characterization (including kinetic parameters and analytical validation) when asserting metabolic roles (wohlgemuth2024backtothe pages 3-6).

4.2 Evidence that annotation gaps remain large (motivating continued research)

Kim et al. quantify the annotation shortfall and the need for bridging EC/GO mappings: they note that among fully specified four-digit EC numbers, as of July 2023 only 5,216 of 8,056 had corresponding GO terms, implying substantial ontology linkage gaps even when enzymatic functions exist (kim2023functionalannotationof pages 6-7). This supports the view that systematic function discovery and careful curation are still necessary.

5) Relevant statistics and data from recent studies

5.1 YgfF-specific quantitative data

  • Specific activity: YgfF glucose 1-dehydrogenase activity measured as 305.55 U mg−1 in vitro (kim2023functionalannotationof pages 6-7).
  • ML prediction score: DeepECtransformer score for EC 1.1.1.47 prediction for YgfF: 0.6331 (kim2023functionalannotationof pages 6-7).
  • Cofactor in reaction depiction: figure reaction scheme indicates NAD → NADH + H+ (kim2023functionalannotationof media 81d7b013).

5.2 Broader y-ome / annotation statistics relevant to ygfF context

  • Roughly ~30% of E. coli genes remain incompletely characterized; the y-ome is used to denote genes lacking sufficient experimental evidence (kim2023functionalannotationof pages 5-6).
  • Of ~1,600 y-ome genes, 1,569 protein sequences were retrievable from UniProt for Kim et al.’s analysis (kim2023functionalannotationof pages 5-6).
  • DeepECtransformer predicted EC numbers for 464 y-ome proteins; 390 had complete four-digit EC numbers; 295 full four-digit EC predictions were unique to DeepECtransformer (kim2023functionalannotationof pages 5-6).
  • Among those 295 uniquely predicted full EC proteins, 179 were predicted soluble in E. coli by NetSolP (model-wide statistic; not specific to YgfF) (kim2023functionalannotationof pages 5-6).

Functional interpretation for ygfF (what is currently supported vs unknown)

Supported: primary enzymatic function and cofactor coupling

The most strongly supported conclusion from recent primary evidence is that E. coli K-12 YgfF catalyzes glucose 1-dehydrogenase activity (EC 1.1.1.47) in vitro, with high specific activity under the employed assay conditions (305.55 U mg−1) (kim2023functionalannotationof pages 6-7). The validation figure’s reaction scheme indicates NAD reduction to NADH + H+ coupled to glucose oxidation (kim2023functionalannotationof media 81d7b013). Together, these data support a biochemical role as an NAD-dependent glucose dehydrogenase under the tested conditions (kim2023functionalannotationof media 81d7b013, kim2023functionalannotationof pages 6-7).

Not yet supported (in retrieved sources): substrate spectrum, kinetic constants, and in vivo pathway role

The retrieved evidence does not provide:
- A substrate panel demonstrating specificity beyond glucose (only glucose is directly described in assay conditions) (kim2023functionalannotationof pages 7-8).
- Michaelis–Menten parameters (Km, kcat) or mechanistic kinetic order (wohlgemuth2024backtothe pages 3-6).
- Definitive NAD vs NADP preference by comparative kinetics; only NAD is shown in the figure scheme (kim2023functionalannotationof media 81d7b013).
- Native subcellular localization measurements (e.g., cytosolic vs periplasmic) or genetic/pathway linkage in E. coli K-12 (crecylagard2025limitationsofcurrent pages 7-9, wohlgemuth2024backtothe pages 3-6).

Accordingly, the most evidence-consistent interpretation is that YgfF is a soluble SDR oxidoreductase capable of catalyzing glucose oxidation in vitro with NAD as depicted cofactor, but its physiological substrate(s), pathway integration, and cellular compartment of action remain unresolved in the retrieved literature and require in vivo validation (crecylagard2025limitationsofcurrent pages 7-9, wohlgemuth2024backtothe pages 3-6).

Evidence summary table

Property Finding for E. coli K-12 YgfF (UniProt P52037) Evidence type Localization / solubility note Key reference
Gene/protein identity YgfF from Escherichia coli K-12/MG1655 y-ome; UniProt-linked annotation aligns with an SDR-family oxidoreductase later assigned as glucose 1-dehydrogenase (kim2023functionalannotationof pages 6-7, crecylagard2025limitationsofcurrent pages 7-9) Database-linked annotation + literature synthesis No direct subcellular localization experimentally reported in the retrieved sources; treated as a soluble recombinant protein in validation experiments (kim2023functionalannotationof pages 7-8, wohlgemuth2024backtothe pages 3-6) Kim et al., 2023, Nat Commun, doi:10.1038/s41467-023-43216-z, https://doi.org/10.1038/s41467-023-43216-z
Predicted function / EC number DeepECtransformer predicted YgfF as EC 1.1.1.47, glucose 1-dehydrogenase, with prediction score 0.6331 (kim2023functionalannotationof pages 6-7) ML prediction Among uniquely predicted 4-digit EC proteins, many were predicted soluble in E. coli by NetSolP, but no YgfF-specific solubility value was reported (kim2023functionalannotationof pages 5-6) Kim et al., 2023, Nat Commun, doi:10.1038/s41467-023-43216-z, https://doi.org/10.1038/s41467-023-43216-z
Family/subgroup assignment YgfF is in the Short-Chain Dehydrogenase/Reductase (SDR) superfamily (IPR002347) and predicted to fall in the SDR63C / glucose 1-dehydrogenase subgroup (crecylagard2025limitationsofcurrent pages 7-9) HMM / family classification Family assignment supports a soluble cytosolic enzyme-like oxidoreductase interpretation, but no direct localization experiment was cited for YgfF (crecylagard2025limitationsofcurrent pages 7-9) de Crécy-Lagard et al., 2025 preprint, doi:10.1101/2024.07.01.601547, https://doi.org/10.1101/2024.07.01.601547
Reaction / cofactor Figure-linked reaction scheme indicates glucose oxidation coupled to NAD reduction to NADH + H+, consistent with glucose 1-dehydrogenase activity and NAD dependence rather than an explicitly demonstrated NADP preference (kim2023functionalannotationof media 81d7b013, kim2023functionalannotationof pages 6-7) Figure-supported biochemical interpretation No intracellular compartment or membrane association evidence reported; recombinant purified enzyme assayed in vitro (kim2023functionalannotationof media 81d7b013, kim2023functionalannotationof pages 7-8) Kim et al., 2023, Nat Commun, doi:10.1038/s41467-023-43216-z, https://doi.org/10.1038/s41467-023-43216-z
Experimental validation Purified His-tagged YgfF showed specific glucose 1-dehydrogenase activity of 305.55 U mg−1 in vitro; authors compared this with 205.70 U mg−1 reported for a characterized Lysinibacillus sphaericus glucose 1-dehydrogenase (kim2023functionalannotationof pages 6-7) In vitro enzyme assay Overexpressed in E. coli BL21(DE3), purified by metal-affinity resin; this supports biochemical tractability/solubility but not native localization (kim2023functionalannotationof pages 7-8) Kim et al., 2023, Nat Commun, doi:10.1038/s41467-023-43216-z, https://doi.org/10.1038/s41467-023-43216-z
Assay conditions Validation used a glucose dehydrogenase colorimetric kit with glucose substrate, GDH assay buffer, developer, and OD450 readout at 37 °C; confirms activity with glucose under assay conditions but does not define broader substrate range or kinetic constants (kim2023functionalannotationof pages 7-8) In vitro assay protocol Assay performed on purified protein; no localization inference beyond soluble preparation (kim2023functionalannotationof pages 7-8) Kim et al., 2023, Nat Commun, doi:10.1038/s41467-023-43216-z, https://doi.org/10.1038/s41467-023-43216-z
Physiological role / pathway inference Current evidence supports biochemical function as a glucose 1-dehydrogenase, but no in vivo pathway assignment, physiological substrate context, or genetic validation in E. coli K-12 was reported in the retrieved literature (kim2023functionalannotationof pages 6-7, crecylagard2025limitationsofcurrent pages 7-9, wohlgemuth2024backtothe pages 3-6) Inference with explicit caution Native localization remains unresolved in retrieved sources (crecylagard2025limitationsofcurrent pages 7-9, wohlgemuth2024backtothe pages 3-6) Wohlgemuth, 2024, Life, doi:10.3390/life14030364, https://doi.org/10.3390/life14030364; de Crécy-Lagard et al., 2025 preprint, doi:10.1101/2024.07.01.601547, https://doi.org/10.1101/2024.07.01.601547
Evidence limitations Retrieved sources do not provide YgfF-specific Km, kcat, Vmax, structural data, substrate spectrum beyond glucose assay conditions, or definitive native localization; expert commentary cautions that in vitro activity alone is insufficient to establish physiological function (crecylagard2025limitationsofcurrent pages 7-9, wohlgemuth2024backtothe pages 3-6) Expert analysis / review Solubility/localization evidence is indirect only (purification, expression, model-wide solubility statistics) (kim2023functionalannotationof pages 7-8, kim2023functionalannotationof pages 5-6) de Crécy-Lagard et al., 2025 preprint, doi:10.1101/2024.07.01.601547, https://doi.org/10.1101/2024.07.01.601547; Wohlgemuth, 2024, doi:10.3390/life14030364, https://doi.org/10.3390/life14030364

Table: This table compiles the main experimentally supported and predicted properties of E. coli K-12 YgfF, including its EC assignment, biochemical evidence, cofactor inference, and current evidence gaps. It is useful as a compact evidence map separating validated findings from family-based or model-based inference.

Key sources (with dates and URLs)

  • Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications (Nov 2023). DOI: 10.1038/s41467-023-43216-z. URL: https://doi.org/10.1038/s41467-023-43216-z (kim2023functionalannotationof pages 6-7, kim2023functionalannotationof media 81d7b013)
  • Wohlgemuth R. Back to the Future of Metabolism—Advances in the Discovery and Characterization of Unknown Biocatalytic Functions and Pathways. Life (Mar 2024). DOI: 10.3390/life14030364. URL: https://doi.org/10.3390/life14030364 (wohlgemuth2024backtothe pages 3-6)
  • de Crécy-Lagard V, Dias R, Sexson N, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. bioRxiv (posted 2024-07-01 per DOI record; accessed here as 2025 listing). DOI: 10.1101/2024.07.01.601547. URL: https://doi.org/10.1101/2024.07.01.601547 (crecylagard2025limitationsofcurrent pages 7-9)

References

  1. (kim2023functionalannotationof pages 6-7): Gi Bae Kim, Ji Yeon Kim, Jong An Lee, Charles J. Norsigian, Bernhard O. Palsson, and Sang Yup Lee. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications, Nov 2023. URL: https://doi.org/10.1038/s41467-023-43216-z, doi:10.1038/s41467-023-43216-z. This article has 110 citations and is from a highest quality peer-reviewed journal.

  2. (kim2023functionalannotationof pages 5-6): Gi Bae Kim, Ji Yeon Kim, Jong An Lee, Charles J. Norsigian, Bernhard O. Palsson, and Sang Yup Lee. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications, Nov 2023. URL: https://doi.org/10.1038/s41467-023-43216-z, doi:10.1038/s41467-023-43216-z. This article has 110 citations and is from a highest quality peer-reviewed journal.

  3. (crecylagard2025limitationsofcurrent pages 7-9): Valérie de Crécy-Lagard, Raquel Dias, Nick Sexson, Iddo Friedberg, Yifeng Yuan, and Manal A. Swairjo. Limitations of current machine-learning models in predicting enzymatic functions for uncharacterized proteins. BioRxiv, Jul 2025. URL: https://doi.org/10.1101/2024.07.01.601547, doi:10.1101/2024.07.01.601547. This article has 8 citations.

  4. (kim2023functionalannotationof media 81d7b013): Gi Bae Kim, Ji Yeon Kim, Jong An Lee, Charles J. Norsigian, Bernhard O. Palsson, and Sang Yup Lee. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications, Nov 2023. URL: https://doi.org/10.1038/s41467-023-43216-z, doi:10.1038/s41467-023-43216-z. This article has 110 citations and is from a highest quality peer-reviewed journal.

  5. (kim2023functionalannotationof pages 7-8): Gi Bae Kim, Ji Yeon Kim, Jong An Lee, Charles J. Norsigian, Bernhard O. Palsson, and Sang Yup Lee. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications, Nov 2023. URL: https://doi.org/10.1038/s41467-023-43216-z, doi:10.1038/s41467-023-43216-z. This article has 110 citations and is from a highest quality peer-reviewed journal.

  6. (wohlgemuth2024backtothe pages 3-6): Roland Wohlgemuth. Back to the future of metabolism—advances in the discovery and characterization of unknown biocatalytic functions and pathways. Life, 14:364, Mar 2024. URL: https://doi.org/10.3390/life14030364, doi:10.3390/life14030364. This article has 2 citations.

Citations

  1. kim2023functionalannotationof pages 6-7
  2. kim2023functionalannotationof pages 5-6
  3. crecylagard2025limitationsofcurrent pages 7-9
  4. kim2023functionalannotationof pages 7-8
  5. wohlgemuth2024backtothe pages 3-6
  6. https://doi.org/10.1038/s41467-023-43216-z
  7. https://doi.org/10.3390/life14030364
  8. https://doi.org/10.1101/2024.07.01.601547
  9. https://doi.org/10.3390/life14030364;
  10. https://doi.org/10.1101/2024.07.01.601547;
  11. https://doi.org/10.1038/s41467-023-43216-z,
  12. https://doi.org/10.1101/2024.07.01.601547,
  13. https://doi.org/10.3390/life14030364,

📄 View Raw YAML

id: P52037
gene_symbol: ygfF
product_type: PROTEIN
status: COMPLETE
taxon:
  id: NCBITaxon:83333
  label: Escherichia coli (strain K12)
description: YgfF is a glucose 1-dehydrogenase (EC 1.1.1.47) belonging to the short-chain
  dehydrogenases/reductases (SDR) family (SDR63C subgroup). It catalyzes the NAD(+)-dependent
  oxidation of D-glucose to D-glucono-1,5-lactone, which spontaneously hydrolyzes to
  D-gluconate. The enzymatic function was predicted by the DeepECtransformer deep learning
  tool (prediction score 0.6331) and experimentally validated in vitro by Kim et al.
  2023, who measured a specific activity of 305.55 U/mg, comparable to characterized
  glucose 1-dehydrogenases from other organisms. Full kinetic parameters (Km, kcat)
  have not yet been determined. YgfF was one of only three genuinely correct novel
  predictions (out of 464) made by DeepECtransformer for the E. coli y-ome. The protein contains a
  conserved NAD(P)-binding Rossmann-fold domain and predicted binding sites for both
  NAD(+) and D-glucose. A physical interaction with LpdA (dihydrolipoyl dehydrogenase)
  was detected by affinity purification-mass spectrometry, though the biological
  significance of this interaction is unclear. The physiological role of YgfF in E. coli
  metabolism remains to be established.
existing_annotations:
- term:
    id: GO:0016491
    label: oxidoreductase activity
  evidence_type: IBA
  original_reference_id: GO_REF:0000033
  review:
    summary: YgfF is experimentally confirmed as a glucose 1-dehydrogenase (EC 1.1.1.47),
      which is a type of oxidoreductase (PMID:37963869). The IBA annotation to oxidoreductase
      activity is correct but much less specific than what is now known. The more specific
      term GO:0047934 (glucose 1-dehydrogenase (NAD+) activity) is supported by direct
      experimental evidence.
    action: ACCEPT
    reason: This IBA annotation is consistent with the experimentally validated function
      of YgfF. While more specific terms exist (and are annotated separately), the IBA
      at this level is not wrong and reflects phylogenetic inference that is consistent
      with experimental data. Glucose 1-dehydrogenase activity is a subtype of oxidoreductase
      activity.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: YgfF exhibited a specific glucose 1-dehydrogenase activity of
        305.55 U mg−1
- term:
    id: GO:0016614
    label: oxidoreductase activity, acting on CH-OH group of donors
  evidence_type: IEA
  original_reference_id: GO_REF:0000117
  review:
    summary: This IEA annotation from ARBA is consistent with the experimentally
      validated glucose 1-dehydrogenase activity. Glucose 1-dehydrogenase acts on the
      CH-OH group of D-glucose, so this intermediate-level annotation is correct. It is
      less specific than GO:0047934 but appropriately reflects the ARBA computational
      prediction.
    action: ACCEPT
    reason: The term is a correct parent of the experimentally validated specific function
      (glucose 1-dehydrogenase (NAD+) activity). The IEA evidence code is appropriate
      for a computationally derived annotation, and the term is consistent with the SDR
      family classification and the known catalytic mechanism.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: For YgfF, DeepECtransformer predicted its EC number to be
        EC:1.1.1.47 (glucose 1-dehydrogenase).
- term:
    id: GO:0047934
    label: glucose 1-dehydrogenase (NAD+) activity
  evidence_type: IEA
  original_reference_id: GO_REF:0000116
  review:
    summary: This IEA annotation is derived from Rhea reaction mapping (RHEA:14293),
      which corresponds to the reaction D-glucose + NAD(+) = D-glucono-1,5-lactone +
      NADH + H(+). This matches the experimentally validated catalytic activity of YgfF
      exactly as described by Kim et al. 2023 and annotated by UniProt (PMID:37963869).
    action: ACCEPT
    reason: The Rhea-derived IEA correctly captures the specific enzymatic activity of
      YgfF. This is also independently supported by the IDA annotation from PMID:37963869
      using the same GO term.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: YgfF exhibited a specific glucose 1-dehydrogenase activity of
        305.55 U mg−1
- term:
    id: GO:0047936
    label: glucose 1-dehydrogenase [NAD(P)+] activity
  evidence_type: IEA
  original_reference_id: GO_REF:0000003
  review:
    summary: This IEA annotation is derived from the EC number mapping (EC:1.1.1.47).
      GO:0047936 describes glucose 1-dehydrogenase that can use either NAD+ or NADP+ as
      cofactor. However, the EC number 1.1.1.47 specifically refers to the NAD+-dependent
      form, and the experimental validation by Kim et al. used an NAD+-dependent assay
      (PMID:37963869). UniProt annotates the catalytic activity with the Rhea reaction
      that specifies NAD+ specifically. The more precise term GO:0047934 (NAD+ specific)
      is the better annotation.
    action: MODIFY
    reason: The EC:1.1.1.47 mapping to GO:0047936 is potentially an overly broad
      mapping, since EC:1.1.1.47 is the NAD+-dependent glucose 1-dehydrogenase, not the
      dual-cofactor NAD(P)+ form (which would be EC:1.1.1.119). The experimental data
      from Kim et al. 2023 validated activity using an NAD+-dependent assay kit, and
      UniProt annotates the Rhea reaction (RHEA:14293) specifically with NAD+. The
      NAD+-specific GO term GO:0047934 is more accurate.
    proposed_replacement_terms:
    - id: GO:0047934
      label: glucose 1-dehydrogenase (NAD+) activity
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: YgfF exhibited a specific glucose 1-dehydrogenase activity of
        305.55 U mg−1
- term:
    id: GO:0005515
    label: protein binding
  evidence_type: IPI
  original_reference_id: PMID:15690043
  review:
    summary: This annotation is based on a high-throughput affinity purification-mass
      spectrometry study by Butland et al. 2005 that detected a physical interaction
      between YgfF and LpdA (dihydrolipoyl dehydrogenase, P0A9P0). The interaction is
      recorded in IntAct with 2 experiments supporting it. However, the GO term
      GO:0005515 (protein binding) is uninformative per curation guidelines and does not
      convey any specific functional information about this interaction.
    action: MARK_AS_OVER_ANNOTATED
    reason: Per curation guidelines, GO:0005515 (protein binding) is too vague and
      uninformative. The Butland et al. 2005 study was a large-scale screen that detected
      many interactions, and the biological significance of the YgfF-LpdA interaction is
      unknown. LpdA functions in the pyruvate dehydrogenase and 2-oxoglutarate
      dehydrogenase complexes, and there is no clear functional connection to glucose
      1-dehydrogenase activity. Without understanding the functional nature of this
      interaction, a generic protein binding annotation provides little value.
    supported_by:
    - reference_id: PMID:15690043
      supporting_text: no large-scale analysis of protein complexes in Escherichia coli
        has yet been reported. To this end, we have targeted DNA cassettes into the E.
        coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open
        reading frames
- term:
    id: GO:0047934
    label: glucose 1-dehydrogenase (NAD+) activity
  evidence_type: IDA
  original_reference_id: PMID:37963869
  review:
    summary: This is the key experimentally validated annotation. Kim et al. 2023
      expressed and purified recombinant His-tagged YgfF from E. coli BL21(DE3) and
      measured glucose 1-dehydrogenase activity in vitro using a colorimetric GDH assay
      kit. The specific activity was 305.55 U/mg, comparable to the previously reported
      value of 205.70 U/mg for glucose 1-dehydrogenase from Lysinibacillus sphaericus.
      The activity was predicted by DeepECtransformer with EC number EC:1.1.1.47 and
      confirmed by the enzyme assay. This represents a validated core function.
    action: ACCEPT
    reason: Direct experimental evidence from in vitro enzyme assay demonstrates glucose
      1-dehydrogenase (NAD+) activity. The specific activity (305.55 U/mg) is robust and
      comparable to characterized homologs. This is the most specific and well-supported
      annotation for YgfF. UniProt has adopted this function based on this study
      (EC:1.1.1.47).
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: For YgfF, DeepECtransformer predicted its EC number to be
        EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF
        exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
    - reference_id: PMID:37963869
      supporting_text: which was comparable to the previously reported value of 205.70 U
        mg−1 for the glucose 1-dehydrogenase from Lysinibacillus sphaericus G10
    - reference_id: file:ECOLI/ygfF/ygfF-deep-research-falcon.md
      supporting_text: Falcon deep research confirms YgfF as EC 1.1.1.47 glucose
        1-dehydrogenase validated by in vitro assay with 305.55 U/mg specific activity,
        and notes SDR63C subgroup classification supports this assignment.
- term:
    id: GO:0051287
    label: NAD binding
  evidence_type: IDA
  original_reference_id: PMID:37963869
  review:
    summary: YgfF requires NAD+ as a cofactor for its glucose 1-dehydrogenase activity.
      The catalytic reaction (D-glucose + NAD(+) = D-glucono-1,5-lactone + NADH + H(+))
      directly involves NAD+ binding. UniProt annotates extensive NAD+ binding residues
      (positions 11, 13, 59, 60, 86, 88, 110, 156, 160, 189, 191, 194) based on
      similarity to characterized SDR family members. The in vitro enzyme assay
      demonstrating NAD+-dependent glucose oxidation provides experimental evidence for
      NAD binding.
    action: NEW
    reason: The experimentally validated glucose 1-dehydrogenase activity requires NAD+
      as a cofactor, and UniProt annotates multiple NAD+ binding residues. NAD binding is
      an inherent aspect of the catalytic mechanism and should be annotated. This is not
      currently present in the GO annotations.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: For YgfF, DeepECtransformer predicted its EC number to be
        EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF
        exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
- term:
    id: GO:0019521
    label: D-gluconate metabolic process
  evidence_type: IDA
  original_reference_id: PMID:37963869
  review:
    summary: YgfF catalyzes the oxidation of D-glucose to D-glucono-1,5-lactone, which
      spontaneously hydrolyzes to D-gluconate. This places YgfF as a participant in
      D-gluconate metabolism. No biological process annotations currently exist for YgfF,
      yet the experimentally validated enzymatic activity directly implicates it in this
      metabolic pathway. However, the in vivo physiological role has not been established,
      so this annotation should be considered with caution.
    action: NEW
    reason: There are currently no biological process annotations for YgfF, which is a
      significant gap. The experimentally validated glucose 1-dehydrogenase activity
      produces D-glucono-1,5-lactone (a precursor to D-gluconate), directly linking YgfF
      to D-gluconate metabolic process. UniProt states the protein catalyzes the
      NAD(+)-dependent oxidation of D-glucose to D-gluconate via gluconolactone.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: YgfF exhibited a specific glucose 1-dehydrogenase activity of
        305.55 U mg−1
- term:
    id: GO:0005829
    label: cytosol
  evidence_type: IDA
  original_reference_id: PMID:37963869
  review:
    summary: YgfF was expressed as a soluble protein and purified from the cytosolic
      fraction (supernatant after cell lysis and centrifugation) by Kim et al. 2023. The
      protein was predicted to be soluble by NetSolP and was successfully purified from
      the soluble fraction. While there are no dedicated localization studies, the
      solubility data and lack of any signal peptide or transmembrane domain strongly
      suggest cytosolic localization. No cellular component annotations currently exist
      for YgfF.
    action: NEW
    reason: There are currently no cellular component annotations for YgfF, which is a
      gap. The protein was purified from the soluble cytoplasmic fraction and has no
      predicted signal peptide or transmembrane domains, consistent with cytosolic
      localization. However, the evidence is indirect (protein was soluble when
      overexpressed) rather than from a dedicated localization study.
    supported_by:
    - reference_id: PMID:37963869
      supporting_text: 179 proteins are predicted to be soluble in E. coli by NetSolP, a
        deep learning model for protein solubility prediction
    - reference_id: PMID:37963869
      supporting_text: Cell debris was separated by centrifugation at 15,044 × g for
        40 min, and the resulting supernatants were loaded onto Talon metal affinity resin
references:
- id: GO_REF:0000003
  title: Gene Ontology annotation based on Enzyme Commission mapping
  findings: []
- id: GO_REF:0000033
  title: Annotation inferences using phylogenetic trees
  findings: []
- id: GO_REF:0000116
  title: Automatic Gene Ontology annotation based on Rhea mapping
  findings: []
- id: GO_REF:0000117
  title: Electronic Gene Ontology annotations created by ARBA machine learning models
  findings: []
- id: PMID:15690043
  title: Interaction network containing conserved and essential protein complexes in
    Escherichia coli.
  findings:
  - statement: High-throughput affinity purification-mass spectrometry detected a physical
      interaction between YgfF (P52037) and LpdA (P0A9P0, dihydrolipoyl dehydrogenase).
    supporting_text: no large-scale analysis of protein complexes in Escherichia coli
      has yet been reported. To this end, we have targeted DNA cassettes into the E.
      coli chromosome to create carboxy-terminal, affinity-tagged alleles of 1,000 open
      reading frames
- id: PMID:37963869
  title: Functional annotation of enzyme-encoding genes using deep learning with transformer
    layers.
  findings:
  - statement: DeepECtransformer predicted YgfF to have EC number EC:1.1.1.47 (glucose
      1-dehydrogenase) and this was experimentally validated by in vitro enzyme assay
      with a specific activity of 305.55 U/mg.
    supporting_text: For YgfF, DeepECtransformer predicted its EC number to be
      EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF
      exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
  - statement: YgfF was one of three randomly selected y-ome proteins whose predicted
      enzymatic functions were experimentally validated, demonstrating the utility of
      DeepECtransformer for functional annotation.
    supporting_text: we randomly selected three proteins, YgfF, YciO, and YdjM, that are
      predicted to be oxidoreductase, transferase, and hydrolase, respectively
  - statement: The neural network predicted YgfF's function despite higher sequence
      identity to a different enzyme (EC:1.1.1.100), showing the model learned functional
      motifs rather than relying solely on homology.
    supporting_text: It should be noted that although YgfF exhibited a higher sequence
      identity with a different enzyme (A0A069CGU9_ECOLX; EC:1.1.1.100) from the
      training dataset than glucose 1-dehydrogenase exhibiting the maximum sequence
      identity within the training dataset, the neural network made an accurate prediction
- id: DOI:10.3390/life14030364
  title: Back to the future of metabolism - advances in the discovery and characterization
    of unknown biocatalytic functions and pathways.
  findings:
  - statement: YgfF is highlighted as an example of ML-assisted functional annotation
      where DeepECtransformer predicted glucose 1-dehydrogenase activity and in vitro
      enzyme assays were performed on overexpressed and affinity-purified protein. The
      review emphasizes that deeper characterization (kinetic parameters, substrate
      spectrum) is needed to fully establish metabolic roles.
    supporting_text: The review stresses that function assignment for unusual/unknown
      enzymes often requires extensive experimental work including expression,
      purification, substrate synthesis, analytical methods, and kinetic
      characterization such as kcat and KM.
- id: PMID:40703034
  title: Limitations of current machine-learning models in predicting enzymatic functions
    for uncharacterized proteins.
  findings:
  - statement: YgfF is classified into the SDR63C / glucose 1-dehydrogenase subgroup
      by HMM-based SDR subfamily analysis, consistent with the DeepECtransformer
      prediction. However, in vitro activity alone is insufficient to establish
      physiological (in vivo) function, and best practice requires combining
      biochemical and genetic evidence.
    supporting_text: This resource predicts YgfF is part of the SDR63C/Glucose
      1-dehydrogenase subgroup, the activity predicted and validated in the Kim et al.
      (2023) study. This prediction demonstrates the accurate propagation of functional
      annotation and is a successful prediction.
core_functions:
- description: NAD(+)-dependent glucose 1-dehydrogenase activity. YgfF catalyzes the
    oxidation of D-glucose to D-glucono-1,5-lactone using NAD+ as the electron
    acceptor. This is the sole experimentally validated enzymatic function, demonstrated
    by in vitro assay with a specific activity of 305.55 U/mg (PMID:37963869). YgfF
    belongs to the SDR family (SDR63C subgroup) and contains a conserved NAD(P)-binding
    Rossmann-fold domain.
  molecular_function:
    id: GO:0047934
    label: glucose 1-dehydrogenase (NAD+) activity
  directly_involved_in:
  - id: GO:0019521
    label: D-gluconate metabolic process
  supported_by:
  - reference_id: PMID:37963869
    supporting_text: For YgfF, DeepECtransformer predicted its EC number to be
      EC:1.1.1.47 (glucose 1-dehydrogenase). The enzyme assay results showed that YgfF
      exhibited a specific glucose 1-dehydrogenase activity of 305.55 U mg−1
suggested_questions:
- question: What is the physiological role of YgfF glucose 1-dehydrogenase activity in
    E. coli K-12 metabolism? Is it involved in glucose catabolism via the
    Entner-Doudoroff pathway or another metabolic route?
  experts:
  - Lee SY
  - Kim GB
- question: What is the biological significance of the YgfF-LpdA physical interaction
    detected by Butland et al. 2005? Does YgfF participate in a metabolic complex
    with pyruvate dehydrogenase components?
  experts:
  - Emili A
  - Butland G
- question: Does YgfF have any activity with NADP+ as cofactor, or is it strictly
    NAD+-dependent? The current annotations include both NAD+ and NAD(P)+ terms.
  experts:
  - Kim GB
  - Lee SY
suggested_experiments:
- hypothesis: YgfF deletion affects growth on glucose as sole carbon source under
    specific metabolic conditions.
  description: Construct a ygfF knockout in E. coli K-12 and test growth phenotypes on
    minimal media with glucose as the sole carbon source under aerobic and anaerobic
    conditions. Compare with wild-type to determine if YgfF contributes to glucose
    utilization in vivo.
  experiment_type: growth phenotype assay
- hypothesis: YgfF is NAD+-specific and does not use NADP+ as an electron acceptor.
  description: Perform in vitro enzyme assays with purified YgfF using NADP+ instead of
    NAD+ as the cofactor to determine cofactor specificity. This would resolve whether
    GO:0047936 (NAD(P)+ form) or GO:0047934 (NAD+ specific) is the correct annotation.
  experiment_type: enzyme kinetics
- hypothesis: The YgfF-LpdA interaction has functional significance in glucose metabolism.
  description: Perform co-purification experiments with tagged YgfF under physiological
    expression levels and test whether LpdA affects YgfF enzymatic activity in vitro.
    Also test if ygfF deletion affects pyruvate dehydrogenase complex activity.
  experiment_type: protein interaction validation