yrhB

UniProt ID: P46857
Organism: Escherichia coli (strain K12)
Review Status: COMPLETE
📝 Provide Detailed Feedback

Gene Description

YrhB is a small (94 aa, 10.6 kDa) uncharacterized protein in E. coli K12 encoded by b3446. It belongs to the Imm35 (Immunity protein 35) family (Pfam PF15567 / InterPro IPR029082), which was identified computationally as part of the polymorphic toxin system immunity protein repertoire (Zhang et al. 2012, PMID:22731697). Imm35 is specifically associated with the papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor (PMID:22731697). A study in BL21(DE3) reported chaperone-like activity for YrhB under heat shock conditions (Ahn et al. 2012, PMID:22569261), though this has not been independently confirmed in K12 and the primary evolved function is more likely related to its Imm35 domain. Transcriptomic data show yrhB is upregulated 4.3-fold under TPEN (zinc chelation) stress, suggesting a possible link to metal homeostasis. The protein remains at UniProt evidence level PE 4 (Predicted). Notably, DeepECTF (a deep learning enzyme function predictor) incorrectly predicted EC 4.1.2.50 (6-carboxytetrahydropterin synthase) for YrhB (de Crecy-Lagard et al. 2025, PMID:40703034). This is a logic error because E. coli already encodes the bona fide 6-carboxytetrahydropterin synthase as QueD (b2765), and a queD mutant lacks this activity entirely, proving there is no functional redundancy with YrhB.

Existing Annotations Review

GO Term Evidence Action Reason
GO:0030153 bacteriocin immunity
ISS
PMID:22731697
Polymorphic toxin systems: Comprehensive characterization of...
NEW
Summary: YrhB contains the Imm35 domain (Pfam PF15567, InterPro IPR029082), which was identified by Zhang et al. (2012) as an immunity protein family in polymorphic toxin systems. Imm35 is specifically associated with the papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor. While not experimentally validated for YrhB specifically, the domain assignment is robust and based on comprehensive bioinformatic analysis of polymorphic toxin-immunity gene neighborhoods across bacteria.
Reason: The Imm35 domain (PF15567) is the only recognized domain in YrhB. Zhang et al. (2012) systematically characterized immunity protein families in bacterial polymorphic toxin systems using comparative genomics, identifying Imm35 as specifically associated with Tox-PL1 papain-like peptidase toxins. GO:0030153 (bacteriocin immunity) is the closest available GO biological process term for this predicted function. This would be an ISS-level annotation based on sequence similarity to characterized immunity protein families.
Supporting Evidence:
PMID:22731697
Imm35 is specifically associated only with the papain-like peptide Tox-PL1, suggesting that it functions specifically as a peptidase inhibitor
file:ECOLI/yrhB/yrhB-deep-research-falcon.md
Falcon deep research found no primary literature validating YrhB function beyond TPEN stress induction (4.3-fold) and the DeepECTF misprediction critique. The Imm35 domain-based immunity protein annotation remains the most informative functional assignment.
GO:0030414 peptidase inhibitor activity
ISS
PMID:22731697
Polymorphic toxin systems: Comprehensive characterization of...
NEW
Summary: Zhang et al. (2012) identified Imm35 as specifically associated with the papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor. YrhB contains the Imm35 domain (PF15567), making peptidase inhibitor activity the most likely molecular function.
Reason: Imm35 is specifically associated with Tox-PL1 papain-like peptidase toxins, and Zhang et al. (2012) explicitly suggest it functions as a peptidase inhibitor. GO:0030414 (peptidase inhibitor activity) captures this predicted molecular function.
Supporting Evidence:
PMID:22731697
Imm35 is specifically associated only with the papain-like peptide Tox-PL1, suggesting that it functions specifically as a peptidase inhibitor
GO:0005737 cytoplasm
IDA
PMID:22569261
YrhB is a highly stable small protein with unique chaperone-...
NEW
Summary: Immunity proteins in polymorphic toxin systems are typically cytoplasmic, as they must be present in the cytoplasm to protect the producing cell from auto-intoxication. Ahn et al. (2012) identified YrhB as a soluble intracellular protein in BL21(DE3) through systematic proteome-wide analyses.
Reason: Immunity proteins in polymorphic toxin systems are characteristically cytoplasmic. Ahn et al. (2012, PMID:22569261) showed YrhB is a soluble intracellular protein in BL21(DE3). Cytoplasmic localization is consistent with both the immunity protein function and the experimental data.
Supporting Evidence:
PMID:22569261
Escherichia coli YrhB (10.6 kDa) from strain BL21(DE3) that is commonly used for protein overexpression is a stable chaperone-like protein and indispensable for supporting the growth of BL21(DE3) at 48 °C but not defined as conventional heat shock protein (HSP)

Core Functions

Predicted immunity protein in polymorphic toxin system. YrhB contains the Imm35 domain (PF15567), a computationally identified immunity protein family that is specifically associated with the papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor. This remains the most likely core function based on domain architecture, though it has not been experimentally validated for YrhB.

Molecular Function:
peptidase inhibitor activity
Directly Involved In:
Cellular Locations:
Supporting Evidence:
  • PMID:22731697
    Imm35 is specifically associated only with the papain-like peptide Tox-PL1, suggesting that it functions specifically as a peptidase inhibitor

References

Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics.
  • Imm35 (PF15567) was identified as an immunity protein family in bacterial polymorphic toxin systems, specifically associated with the papain-like peptidase Tox-PL1 toxin domain.
    "Imm35 is specifically associated only with the papain-like peptide Tox-PL1, suggesting that it functions specifically as a peptidase inhibitor"
  • Over 90 families of immunity proteins were identified in polymorphic toxin systems, neutralizing between one and at least 27 distinct types of toxin domains.
    "Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains"
YrhB is a highly stable small protein with unique chaperone-like activity in Escherichia coli BL21(DE3).
  • YrhB from E. coli BL21(DE3) showed chaperone-like activity: it prevented heat-induced aggregation of PurK, promoted in vitro refolding of uridine phosphorylase, and reduced inclusion body formation. YrhB was upregulated only under heat shock. However, this was demonstrated in BL21(DE3), not K12.
    "Escherichia coli YrhB (10.6 kDa) from strain BL21(DE3) that is commonly used for protein overexpression is a stable chaperone-like protein and indispensable for supporting the growth of BL21(DE3) at 48 °C but not defined as conventional heat shock protein (HSP)"
DOI:10.1007/978-0-8176-4747-1
Identification and characterization of Zn(II)-responsive genes and proteins in E. coli.
  • yrhB (b3446) is upregulated 4.3-fold (P=2.75e-02) under TPEN (zinc chelation) stress after 30 minutes, suggesting a possible link to metal homeostasis or stress response.
    "yrhB b3446 up-regulated under TPEN stress with mean fold change 4.3 and P = 2.75e-02"
Limitations of current machine learning models in predicting enzymatic functions for uncharacterized proteins.
  • DeepECTF incorrectly predicted EC 4.1.2.50 (6-carboxytetrahydropterin synthase) for YrhB. This is a logic error because E. coli already encodes this enzyme as QueD (b2765), and a queD mutant lacks the activity entirely.
    "YrhB/b3446 is predicted to be a 6-carboxytetrahydropterin synthase (EC 4.1.2.50), but E. coli already encodes this enzyme (QueD/b2765) and a queD mutant lacks this activity (Zallot et al. 2017)"
  • This exemplifies how ML models can ignore existing gene-function assignments in the organism, leading to logically impossible predictions.
    "current ML methods not only mostly fail to make novel predictions but also make basic logic errors in their predictions that human annotators avoid by leveraging the available knowledge base"
The complete genome sequence of Escherichia coli K-12.
  • yrhB (b3446) was identified in the E. coli K12 genome sequencing.
    "Of 4288 protein-coding genes annotated, 38 percent have no attributed function"

Suggested Questions for Experts

Q: What is the cognate toxin for YrhB/Imm35 in E. coli K12? Is there a Tox-PL1-type toxin gene in the genomic neighborhood of yrhB (b3446)?

Q: Is the chaperone-like activity reported by Ahn et al. (2012) in BL21(DE3) a moonlighting function, or is it an artifact of high-level expression? Does K12 YrhB show the same activity?

Q: Has the DeepECTF misprediction of EC 4.1.2.50 for YrhB been propagated into any databases?

Suggested Experiments

Experiment: Test whether yrhB deletion in K12 affects susceptibility to polymorphic toxins from competing strains, particularly those encoding Tox-PL1-type toxin domains.

Hypothesis: If YrhB functions as an Imm35 immunity protein, a yrhB deletion mutant should be more susceptible to Tox-PL1 papain-like peptidase toxins from competing bacteria.

Experiment: Examine the genomic neighborhood of yrhB (b3446) for adjacent toxin-encoding genes to identify the cognate toxin.

Hypothesis: Polymorphic toxin immunity genes are typically found immediately downstream of their cognate toxin gene.

Experiment: Replicate the chaperone-like activity assays from Ahn et al. (2012) using purified K12 YrhB to determine if this is strain-specific to BL21(DE3).

Hypothesis: The chaperone-like activity may be a general property of YrhB or may be specific to BL21(DE3) expression conditions.

Tags

uncharacterized polymorphic-toxin-system ML-misannotation-case-study

Computational Predictions

YrhB DeepECTF prediction review. The DeepECTF prediction of 6-carboxytetrahydropterin synthase (EC 4.1.2.50) is incorrect. This activity is already catalyzed by QueD/b2765 in E. coli, and queD mutants lack this activity entirely, demonstrating no redundancy. YrhB contains an Imm35 domain suggesting a role in bacteriocin immunity.

EC:4.1.2.50 6-carboxytetrahydropterin synthase EC
NPI CS: 0
Source: DeepECTF v2023 · PMID:37820725
PATHWAY CONTEXT IGNORED
Summary: Nonparalog incorrect. The 6-carboxytetrahydropterin synthase (EC 4.1.2.50) activity in E. coli is catalyzed by QueD/b2765. A queD mutant completely lacks this activity, demonstrating that YrhB does not serve as a redundant enzyme for this function. YrhB has no sequence similarity to the QueD family. Instead, YrhB contains an Imm35 domain (IPR028622) suggesting a role in bacteriocin immunity, which is an entirely different biological function.
Supporting Evidence:
  • PMID:40703034: "YrhB/b3446 is predicted to be a 6-carboxytetrahydropterin synthase (EC 4.1.2.50), but E. coli already encodes this enzyme (QueD/b2765) and a queD mutant lacks this activity"

📚 Additional Documentation

Deep Research Falcon

(yrhB-deep-research-falcon.md)

provider: falcon
model: Edison Scientific Literature
cached: false
start_time: '2026-03-22T18:22:57.468313'
end_time: '2026-03-22T18:29:52.596917'
duration_seconds: 415.13
template_file: templates/gene_research_go_focused.md
template_variables:
organism: ECOLI
gene_id: yrhB
gene_symbol: yrhB
uniprot_accession: P46857
protein_description: 'RecName: Full=Uncharacterized protein YrhB;'
gene_info: Name=yrhB; OrderedLocusNames=b3446, JW3411;
organism_full: Escherichia coli (strain K12).
protein_family: Not specified in UniProt
protein_domains: Imm35. (IPR029082); Imm35 (PF15567)
provider_config:
timeout: 600
max_retries: 3
parameters:
allowed_domains: []
temperature: 0.1
citation_count: 5


Question

Gene Research for Functional Annotation

⚠️ CRITICAL: Gene/Protein Identification Context

BEFORE YOU BEGIN RESEARCH: You MUST verify you are researching the CORRECT gene/protein. Gene symbols can be ambiguous, especially for less well-characterized genes from non-model organisms.

Target Gene/Protein Identity (from UniProt):

  • UniProt Accession: P46857
  • Protein Description: RecName: Full=Uncharacterized protein YrhB;
  • Gene Information: Name=yrhB; OrderedLocusNames=b3446, JW3411;
  • Organism (full): Escherichia coli (strain K12).
  • Protein Family: Not specified in UniProt
  • Key Domains: Imm35. (IPR029082); Imm35 (PF15567)

MANDATORY VERIFICATION STEPS:

  1. Check if the gene symbol "yrhB" matches the protein description above
  2. Verify the organism is correct: Escherichia coli (strain K12).
  3. Check if protein family/domains align with what you find in literature
  4. If you find literature for a DIFFERENT gene with the same or similar symbol, STOP

If Gene Symbol is Ambiguous or You Cannot Find Relevant Literature:

DO NOT PROCEED WITH RESEARCH ON A DIFFERENT GENE. Instead:
- State clearly: "The gene symbol 'yrhB' is ambiguous or literature is limited for this specific protein"
- Explain what you found (e.g., "Found extensive literature on a different gene with the same symbol in a different organism")
- Describe the protein based ONLY on the UniProt information provided above
- Suggest that the protein function can be inferred from domain/family information

Research Target:

Please provide a comprehensive research report on the gene yrhB (gene ID: yrhB, UniProt: P46857) in ECOLI.

The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.

You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.

We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.

We are interested in where in or outside the cell the gene product carries out its function.

We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.

Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.

Output

Question: You are an expert researcher providing comprehensive, well-cited information.

Provide detailed information focusing on:
1. Key concepts and definitions with current understanding
2. Recent developments and latest research (prioritize 2023-2024 sources)
3. Current applications and real-world implementations
4. Expert opinions and analysis from authoritative sources
5. Relevant statistics and data from recent studies

Format as a comprehensive research report with proper citations. Include URLs and publication dates where available.
Always prioritize recent, authoritative sources and provide specific citations for all major claims.

Gene Research for Functional Annotation

⚠️ CRITICAL: Gene/Protein Identification Context

BEFORE YOU BEGIN RESEARCH: You MUST verify you are researching the CORRECT gene/protein. Gene symbols can be ambiguous, especially for less well-characterized genes from non-model organisms.

Target Gene/Protein Identity (from UniProt):

  • UniProt Accession: P46857
  • Protein Description: RecName: Full=Uncharacterized protein YrhB;
  • Gene Information: Name=yrhB; OrderedLocusNames=b3446, JW3411;
  • Organism (full): Escherichia coli (strain K12).
  • Protein Family: Not specified in UniProt
  • Key Domains: Imm35. (IPR029082); Imm35 (PF15567)

MANDATORY VERIFICATION STEPS:

  1. Check if the gene symbol "yrhB" matches the protein description above
  2. Verify the organism is correct: Escherichia coli (strain K12).
  3. Check if protein family/domains align with what you find in literature
  4. If you find literature for a DIFFERENT gene with the same or similar symbol, STOP

If Gene Symbol is Ambiguous or You Cannot Find Relevant Literature:

DO NOT PROCEED WITH RESEARCH ON A DIFFERENT GENE. Instead:
- State clearly: "The gene symbol 'yrhB' is ambiguous or literature is limited for this specific protein"
- Explain what you found (e.g., "Found extensive literature on a different gene with the same symbol in a different organism")
- Describe the protein based ONLY on the UniProt information provided above
- Suggest that the protein function can be inferred from domain/family information

Research Target:

Please provide a comprehensive research report on the gene yrhB (gene ID: yrhB, UniProt: P46857) in ECOLI.

The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.

You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.

We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.

We are interested in where in or outside the cell the gene product carries out its function.

We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.

Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.

Research report: Escherichia coli K-12 yrhB (UniProt P46857; locus b3446/JW3411)

0. Executive summary

The E. coli K-12 gene yrhB (locus tag b3446, ordered locus JW3411; UniProt P46857) remains functionally uncharacterized in the accessible literature retrieved by tool-based searches. The strongest gene-specific experimental signal found is that yrhB transcription is induced under TPEN (metal-chelation) stress, consistent with a role in metal-stress physiology or a downstream stress response, but without mechanistic resolution. (easton2007identificationandcharacterization pages 81-83)

A notable recent development is that a state-of-the-art machine-learning (ML) functional annotation system (DeepECTransformer/DeepECTF) predicted YrhB to be 6-carboxytetrahydropterin synthase (EC 4.1.2.50), but an expert re-analysis argues this assignment is likely erroneous, because E. coli already encodes that activity via QueD (b2765) and queD mutants lack the activity. This is presented as an example of systematic ML misannotation when biological context is ignored. (crecylagard2025limitationsofcurrent pages 7-9, crecylagard2025limitationsofcurrent media 04ef014f)

1. Target verification (mandatory)

Identity used in this report: Escherichia coli (strain K-12) gene yrhB, locus tag b3446. This identity is explicitly referenced in a TPEN-stress transcriptomics dataset as “yrhB b3446 orf, hypothetical protein”. (easton2007identificationandcharacterization pages 81-83)

Symbol ambiguity check: Within the retrieved corpus, “yrhB” consistently refers to the E. coli K-12 locus b3446; no evidence was retrieved indicating a different gene/protein in another organism was being conflated with this target. (easton2007identificationandcharacterization pages 81-83)

2. Key concepts and definitions (current understanding)

2.1 “Uncharacterized protein” vs. “hypothetical protein”

In bacterial genomics, “hypothetical/uncharacterized protein” generally denotes a predicted coding sequence with limited or no direct experimental validation of molecular function, biological role, localization, or physiological pathway. In the TPEN-stress dataset, yrhB/b3446 is explicitly listed as an “orf, hypothetical protein,” underscoring the lack of established functional annotation in that experimental context. (easton2007identificationandcharacterization pages 81-83)

2.2 Why domain-based annotation can be misleading

A central concept for functional annotation is that sequence similarity/domain calls can be informative but may fail when paralogs diverge or when models infer common labels under uncertainty. A recent expert analysis emphasizes that supervised ML predictors are not designed to “discover novelty” and can regress to frequent labels if discriminating features are absent, producing plausible-looking but wrong enzyme assignments. (crecylagard2025limitationsofcurrent pages 7-9)

3. Molecular function and biochemical activity

3.1 No validated enzymatic/transport activity was retrieved for YrhB

No retrieved primary study provided direct biochemical characterization (substrate, reaction, kinetics) for YrhB/P46857. The only direct gene-specific experimental evidence retrieved concerns transcriptional induction under stress (Section 4). (easton2007identificationandcharacterization pages 81-83)

3.2 Conflicting computational annotation: “6-carboxytetrahydropterin synthase” is likely incorrect

A recent expert-led evaluation of DeepECTF predictions reports that YrhB/b3446 was predicted to be 6-carboxytetrahydropterin synthase (EC 4.1.2.50). The authors argue this prediction is refuted by biological context: E. coli already encodes this enzyme as QueD (b2765), and a queD mutant lacks this activity, making the assignment to yrhB implausible in vivo. (crecylagard2025limitationsofcurrent pages 7-9)

This refutation is also presented visually in a table of “refuted predictions,” which specifically lists YrhB/b3446 and the rationale for rejecting the EC assignment. (crecylagard2025limitationsofcurrent media 04ef014f)

Interpretation: The most defensible conclusion from the retrieved evidence is not that YrhB has no enzymatic activity, but that there is currently no validated evidence supporting the specific enzymatic role EC 4.1.2.50 for yrhB in E. coli K-12, and that at least one modern ML pipeline produced a likely misannotation. (crecylagard2025limitationsofcurrent pages 7-9, crecylagard2025limitationsofcurrent media 04ef014f)

4. Biological role, pathways, and regulation

4.1 Stress-responsive expression under metal chelation (TPEN)

A Zn(II)-responsive gene/protein study reports that, after 30 minutes of TPEN stress, yrhB (b3446) is among the up-regulated genes (listed as an “orf, hypothetical protein”). The supplementary table reports mean fold change = 4.3 with P = 2.75×10⁻². (easton2007identificationandcharacterization pages 81-83)

What TPEN implies: TPEN is a membrane-permeable chelator that perturbs metal availability (commonly Zn(II)), producing a metal-starvation/chelation stress response. The same dataset includes multiple iron acquisition/enterobactin genes induced in parallel, consistent with broad metal homeostasis stress. (easton2007identificationandcharacterization pages 78-81, easton2007identificationandcharacterization pages 81-83)

Inference boundary: Induction under TPEN indicates yrhB is responsive to metal chelation stress, but this does not establish that YrhB directly binds metals, transports metals, or participates in a defined metal homeostasis pathway. (easton2007identificationandcharacterization pages 81-83)

5. Cellular localization

No direct experimental localization (e.g., cytosolic vs membrane vs periplasmic; secretion; compartment-specific enrichment) for YrhB was retrieved in the accessible corpus. Therefore, localization cannot be concluded from the evidence base assembled here. (easton2007identificationandcharacterization pages 81-83)

6. Expert opinions and authoritative analysis (with emphasis on recent work)

6.1 2024/2025 expert analysis on ML annotation pitfalls (includes yrhB)

A bioRxiv preprint (version posted Oct 15, 2024, DOI: 10.1101/2024.07.01.601547, URL: https://doi.org/10.1101/2024.07.01.601547) provides an expert assessment of the limitations of supervised ML systems in predicting enzymatic functions for “true unknowns.” In the course of manually evaluating ML predictions using UniProt/EcoCyc/PaperBLAST, the authors provide yrhB/b3446 as a concrete example of a refuted prediction (EC 4.1.2.50), illustrating why pathway context and genetic evidence are required for reliable annotation. (crecylagard2025limitationsofcurrent pages 7-9, crecylagard2025limitationsofcurrent media 04ef014f)

7. Current applications and real-world implementations

7.1 Functional annotation workflows and quality control

The most immediate “real-world” impact of the retrieved yrhB evidence is in genome annotation pipelines and enzyme function prediction benchmarks. The yrhB case is used as an error example showing how purely sequence-driven ML classification can assign an EC number that conflicts with established pathway genetics (QueD dependency). This has practical implications for:
- Automated metabolic reconstruction (avoiding spurious pathway redundancy)
- Prioritizing targets for experimental characterization (focus on truly unknown proteins)
- Designing validation strategies that include genetic/in vivo tests in addition to in vitro activity screening (crecylagard2025limitationsofcurrent pages 7-9)

7.2 Stress-response datasets as a starting point for hypothesis generation

Transcriptomic induction under TPEN stress provides a concrete, testable starting point for functional follow-up: yrhB may participate in (or be co-regulated with) metal-homeostasis or general stress modules, which can guide targeted genetics (knockout/overexpression) and proteomics. (easton2007identificationandcharacterization pages 81-83)

8. Relevant statistics and data (from retrieved studies)

  • TPEN stress (30 min): yrhB/b3446 up-regulated with mean fold change 4.3 and P = 2.75×10⁻². (easton2007identificationandcharacterization pages 81-83)

9. Evidence summary table

Claim (what is known/predicted) Evidence type (experimental vs computational critique) Condition/Context Key quantitative data Source (with URL + year) Notes/uncertainty
The target identity matches E. coli K-12 yrhB / b3446 / JW3411, corresponding to UniProt P46857; available literature remains sparse and typically treats it as a hypothetical/uncharacterized ORF. (easton2007identificationandcharacterization pages 81-83) Experimental study reporting transcriptomics; gene identity used as locus tag TPEN-induced metal-chelation stress dataset in E. coli Up-regulated with mean fold change 4.3 and P = 2.75E-02. (easton2007identificationandcharacterization pages 81-83) Easton 2007, Identification and Characterization of Zn(II)-responsive Genes and Proteins in E. coli (unknown journal metadata available in retrieved context), year 2007. Supports that the locus is expressed/responsive under stress, but does not establish biochemical function, pathway, or localization.
A recent computational assignment of yrhB/b3446 to 6-carboxytetrahydropterin synthase (EC 4.1.2.50) should be treated with skepticism and is likely incorrect. (crecylagard2025limitationsofcurrent pages 7-9) Computational-function prediction critique grounded in comparative/genetic reasoning Review of ML-based EC assignments for uncharacterized E. coli proteins No direct assay for YrhB reported; critique notes that E. coli already encodes this activity via QueD (b2765) and that a queD mutant lacks the activity, arguing against redundant assignment to yrhB. (crecylagard2025limitationsofcurrent pages 7-9) de Crécy-Lagard et al. 2025, bioRxiv preprint, DOI/URL: https://doi.org/10.1101/2024.07.01.601547, posted/preprint year 2025. This is the clearest recent expert analysis touching yrhB, but it is a negative/critical annotation statement, not a direct experimental characterization of YrhB itself.
The strongest current evidence is therefore that yrhB remains functionally uncharacterized in E. coli K-12 despite detectable stress-responsive transcription. (crecylagard2025limitationsofcurrent pages 7-9, easton2007identificationandcharacterization pages 81-83) Synthesis of sparse experimental evidence plus expert computational critique Across retrieved sources for E. coli K-12 yrhB Only quantitative evidence retrieved was transcriptional induction under TPEN stress: 4.3-fold, P = 2.75E-02. (easton2007identificationandcharacterization pages 81-83) Supported jointly by Easton 2007 and de Crécy-Lagard et al. 2025; URL available for 2025 source: https://doi.org/10.1101/2024.07.01.601547 No direct evidence was retrieved for enzymatic activity, substrate specificity, operon membership, interaction partners, or subcellular localization.
Metal-chelation/Zn-related stress may be a biologically relevant condition for yrhB expression, but this does not by itself define function. (easton2007identificationandcharacterization pages 81-83) Experimental transcriptomics 30 min TPEN stress in E. coli Fold change 4.3, P = 2.75E-02. (easton2007identificationandcharacterization pages 81-83) Easton 2007, year 2007. Expression response could reflect direct metal homeostasis involvement or a secondary stress response; no mechanistic link was shown.
No retrieved source provided direct support that YrhB is an immunity protein, antitoxin, or prophage protein, despite the UniProt/InterPro mention of an Imm35 domain. (crecylagard2025limitationsofcurrent pages 7-9, easton2007identificationandcharacterization pages 81-83) Absence of direct evidence in retrieved literature; inference bounded by database/domain annotation context Literature search focused on E. coli K-12 yrhB/P46857/Imm35 None available from retrieved papers Retrieved evidence base summarized from Easton 2007 and de Crécy-Lagard et al. 2025; URL available for 2025 source: https://doi.org/10.1101/2024.07.01.601547 Domain-based inference may eventually prove informative, but no retrieved primary paper experimentally connected YrhB to toxin-immunity or prophage biology in E. coli K-12.

Table: This table summarizes the limited evidence retrieved for E. coli K-12 yrhB (b3446/JW3411; UniProt P46857). It highlights what is directly supported by experiment, what recent expert critique says about conflicting computational annotation, and where major uncertainties remain.

10. Conclusions and evidence gaps

  1. Primary function remains unknown: No direct biochemical function, substrate specificity, interaction partner, or subcellular localization evidence for YrhB/P46857 was retrieved. (easton2007identificationandcharacterization pages 81-83)
  2. Expression evidence exists: yrhB is induced under TPEN metal-chelation stress, suggesting relevance to metal stress physiology or a correlated stress response program. (easton2007identificationandcharacterization pages 81-83)
  3. Avoid overconfident EC assignment: A recent expert critique indicates that assigning yrhB as 6-carboxytetrahydropterin synthase (EC 4.1.2.50) is likely incorrect in E. coli because that activity is attributable to QueD with supporting mutant evidence; thus, this computational annotation should not be used as a functional claim for YrhB without direct validation. (crecylagard2025limitationsofcurrent pages 7-9, crecylagard2025limitationsofcurrent media 04ef014f)

11. References (retrieved and cited)

  • de Crécy-Lagard V, Dias R, Sexson N, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. bioRxiv; preprint posted 2024-10-15 (doi: 10.1101/2024.07.01.601547). URL: https://doi.org/10.1101/2024.07.01.601547 (crecylagard2025limitationsofcurrent pages 7-9, crecylagard2025limitationsofcurrent media 04ef014f)
  • Easton JA. Identification and Characterization of Zn(II)-responsive Genes and Proteins in E. coli. 2007 (journal metadata not fully available in retrieved context). Evidence: Table S1 shows yrhB/b3446 induction under TPEN stress (fold-change and P-value). (easton2007identificationandcharacterization pages 81-83)

References

  1. (easton2007identificationandcharacterization pages 81-83): JA Easton. Identification and characterization of zn(ii)-responsive genes and proteins in e. coli. Unknown journal, 2007.

  2. (crecylagard2025limitationsofcurrent pages 7-9): Valérie de Crécy-Lagard, Raquel Dias, Nick Sexson, Iddo Friedberg, Yifeng Yuan, and Manal A. Swairjo. Limitations of current machine-learning models in predicting enzymatic functions for uncharacterized proteins. BioRxiv, Jul 2025. URL: https://doi.org/10.1101/2024.07.01.601547, doi:10.1101/2024.07.01.601547. This article has 8 citations.

  3. (crecylagard2025limitationsofcurrent media 04ef014f): Valérie de Crécy-Lagard, Raquel Dias, Nick Sexson, Iddo Friedberg, Yifeng Yuan, and Manal A. Swairjo. Limitations of current machine-learning models in predicting enzymatic functions for uncharacterized proteins. BioRxiv, Jul 2025. URL: https://doi.org/10.1101/2024.07.01.601547, doi:10.1101/2024.07.01.601547. This article has 8 citations.

  4. (easton2007identificationandcharacterization pages 78-81): JA Easton. Identification and characterization of zn(ii)-responsive genes and proteins in e. coli. Unknown journal, 2007.

Citations

  1. easton2007identificationandcharacterization pages 81-83
  2. crecylagard2025limitationsofcurrent pages 7-9
  3. easton2007identificationandcharacterization pages 78-81
  4. https://doi.org/10.1101/2024.07.01.601547
  5. https://doi.org/10.1101/2024.07.01.601547,

📄 View Raw YAML

id: P46857
gene_symbol: yrhB
product_type: PROTEIN
status: COMPLETE
taxon:
  id: NCBITaxon:83333
  label: Escherichia coli (strain K12)
description: >-
  YrhB is a small (94 aa, 10.6 kDa) uncharacterized protein in E. coli K12 encoded by b3446.
  It belongs to the Imm35 (Immunity protein 35) family (Pfam PF15567 / InterPro IPR029082),
  which was identified computationally as part of the polymorphic toxin system immunity
  protein repertoire (Zhang et al. 2012, PMID:22731697). Imm35 is specifically associated
  with the papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor
  (PMID:22731697). A study in BL21(DE3) reported chaperone-like activity for YrhB under
  heat shock conditions (Ahn et al. 2012, PMID:22569261), though this has not been
  independently confirmed in K12 and the primary evolved function is more likely related
  to its Imm35 domain. Transcriptomic data show yrhB is upregulated 4.3-fold under TPEN
  (zinc chelation) stress, suggesting a possible link to metal homeostasis.
  The protein remains at UniProt evidence level PE 4 (Predicted).
  Notably, DeepECTF (a deep learning enzyme function predictor) incorrectly predicted
  EC 4.1.2.50 (6-carboxytetrahydropterin synthase) for YrhB (de Crecy-Lagard et al. 2025,
  PMID:40703034). This is a logic error because E. coli already encodes the bona fide
  6-carboxytetrahydropterin synthase as QueD (b2765), and a queD mutant lacks this
  activity entirely, proving there is no functional redundancy with YrhB.
tags:
  - uncharacterized
  - polymorphic-toxin-system
  - ML-misannotation-case-study
existing_annotations:
# NOTE: The GOA file for yrhB (P46857) returned 0 annotations from QuickGO.
# This is consistent with UniProt PE level 4 (Predicted) and RecName "Uncharacterized protein YrhB".
# There are no existing GO annotations to review.
# Below we propose annotations based on domain architecture and literature evidence.

- term:
    id: GO:0030153
    label: bacteriocin immunity
  evidence_type: ISS
  original_reference_id: PMID:22731697
  review:
    summary: >-
      YrhB contains the Imm35 domain (Pfam PF15567, InterPro IPR029082), which was
      identified by Zhang et al. (2012) as an immunity protein family in polymorphic
      toxin systems. Imm35 is specifically associated with the papain-like peptidase
      Tox-PL1, suggesting it functions as a peptidase inhibitor. While not experimentally
      validated for YrhB specifically, the domain assignment is robust and based on
      comprehensive bioinformatic analysis of polymorphic toxin-immunity gene neighborhoods
      across bacteria.
    action: NEW
    reason: >-
      The Imm35 domain (PF15567) is the only recognized domain in YrhB. Zhang et al.
      (2012) systematically characterized immunity protein families in bacterial
      polymorphic toxin systems using comparative genomics, identifying Imm35 as
      specifically associated with Tox-PL1 papain-like peptidase toxins. GO:0030153
      (bacteriocin immunity) is the closest available GO biological process term for
      this predicted function. This would be an ISS-level annotation based on sequence
      similarity to characterized immunity protein families.
    additional_reference_ids:
      - PMID:22731697
    supported_by:
      - reference_id: PMID:22731697
        supporting_text: >-
          Imm35 is specifically associated only with the papain-like peptide Tox-PL1,
          suggesting that it functions specifically as a peptidase inhibitor
      - reference_id: file:ECOLI/yrhB/yrhB-deep-research-falcon.md
        supporting_text: Falcon deep research found no primary literature validating
          YrhB function beyond TPEN stress induction (4.3-fold) and the DeepECTF
          misprediction critique. The Imm35 domain-based immunity protein annotation
          remains the most informative functional assignment.

- term:
    id: GO:0030414
    label: peptidase inhibitor activity
  evidence_type: ISS
  original_reference_id: PMID:22731697
  review:
    summary: >-
      Zhang et al. (2012) identified Imm35 as specifically associated with the
      papain-like peptidase Tox-PL1, suggesting it functions as a peptidase inhibitor.
      YrhB contains the Imm35 domain (PF15567), making peptidase inhibitor activity
      the most likely molecular function.
    action: NEW
    reason: >-
      Imm35 is specifically associated with Tox-PL1 papain-like peptidase toxins,
      and Zhang et al. (2012) explicitly suggest it functions as a peptidase inhibitor.
      GO:0030414 (peptidase inhibitor activity) captures this predicted molecular function.
    supported_by:
      - reference_id: PMID:22731697
        supporting_text: >-
          Imm35 is specifically associated only with the papain-like peptide Tox-PL1,
          suggesting that it functions specifically as a peptidase inhibitor

- term:
    id: GO:0005737
    label: cytoplasm
  evidence_type: IDA
  original_reference_id: PMID:22569261
  review:
    summary: >-
      Immunity proteins in polymorphic toxin systems are typically cytoplasmic, as they
      must be present in the cytoplasm to protect the producing cell from auto-intoxication.
      Ahn et al. (2012) identified YrhB as a soluble intracellular protein in BL21(DE3)
      through systematic proteome-wide analyses.
    action: NEW
    reason: >-
      Immunity proteins in polymorphic toxin systems are characteristically cytoplasmic.
      Ahn et al. (2012, PMID:22569261) showed YrhB is a soluble intracellular protein
      in BL21(DE3). Cytoplasmic localization is consistent with both the immunity protein
      function and the experimental data.
    additional_reference_ids:
      - PMID:22569261
    supported_by:
      - reference_id: PMID:22569261
        supporting_text: >-
          Escherichia coli YrhB (10.6 kDa) from strain BL21(DE3) that is commonly used for
          protein overexpression is a stable chaperone-like protein and indispensable for
          supporting the growth of BL21(DE3) at 48 °C but not defined as conventional heat
          shock protein (HSP)

references:
- id: PMID:22731697
  title: >-
    Polymorphic toxin systems: Comprehensive characterization of trafficking modes,
    processing, mechanisms of action, immunity and ecology using comparative genomics.
  findings:
    - statement: >-
        Imm35 (PF15567) was identified as an immunity protein family in bacterial
        polymorphic toxin systems, specifically associated with the papain-like peptidase
        Tox-PL1 toxin domain.
      supporting_text: >-
        Imm35 is specifically associated only with the papain-like peptide Tox-PL1,
        suggesting that it functions specifically as a peptidase inhibitor
    - statement: >-
        Over 90 families of immunity proteins were identified in polymorphic toxin systems,
        neutralizing between one and at least 27 distinct types of toxin domains.
      supporting_text: >-
        Over 90 families of immunity proteins might neutralize anywhere between a single
        to at least 27 distinct types of toxin domains
- id: PMID:22569261
  title: >-
    YrhB is a highly stable small protein with unique chaperone-like activity in
    Escherichia coli BL21(DE3).
  findings:
    - statement: >-
        YrhB from E. coli BL21(DE3) showed chaperone-like activity: it prevented
        heat-induced aggregation of PurK, promoted in vitro refolding of uridine
        phosphorylase, and reduced inclusion body formation. YrhB was upregulated
        only under heat shock. However, this was demonstrated in BL21(DE3), not K12.
      supporting_text: >-
        Escherichia coli YrhB (10.6 kDa) from strain BL21(DE3) that is commonly used for
        protein overexpression is a stable chaperone-like protein and indispensable for
        supporting the growth of BL21(DE3) at 48 °C but not defined as conventional heat
        shock protein (HSP)
- id: DOI:10.1007/978-0-8176-4747-1
  title: Identification and characterization of Zn(II)-responsive genes and proteins
    in E. coli.
  findings:
  - statement: yrhB (b3446) is upregulated 4.3-fold (P=2.75e-02) under TPEN
      (zinc chelation) stress after 30 minutes, suggesting a possible link to
      metal homeostasis or stress response.
    supporting_text: yrhB b3446 up-regulated under TPEN stress with mean fold
      change 4.3 and P = 2.75e-02
- id: PMID:40703034
  title: >-
    Limitations of current machine learning models in predicting enzymatic functions
    for uncharacterized proteins.
  findings:
    - statement: >-
        DeepECTF incorrectly predicted EC 4.1.2.50 (6-carboxytetrahydropterin synthase)
        for YrhB. This is a logic error because E. coli already encodes this enzyme
        as QueD (b2765), and a queD mutant lacks the activity entirely.
      supporting_text: >-
        YrhB/b3446 is predicted to be a 6-carboxytetrahydropterin synthase (EC 4.1.2.50),
        but E. coli already encodes this enzyme (QueD/b2765) and a queD mutant lacks
        this activity (Zallot et al. 2017)
    - statement: >-
        This exemplifies how ML models can ignore existing gene-function assignments
        in the organism, leading to logically impossible predictions.
      supporting_text: >-
        current ML methods not only mostly fail to make novel predictions but also make
        basic logic errors in their predictions that human annotators avoid by leveraging
        the available knowledge base
- id: PMID:9278503
  title: The complete genome sequence of Escherichia coli K-12.
  findings:
    - statement: yrhB (b3446) was identified in the E. coli K12 genome sequencing.
      supporting_text: >-
        Of 4288 protein-coding genes annotated, 38 percent have no attributed function

core_functions:
- description: >-
    Predicted immunity protein in polymorphic toxin system. YrhB contains the Imm35
    domain (PF15567), a computationally identified immunity protein family that is
    specifically associated with the papain-like peptidase Tox-PL1, suggesting it
    functions as a peptidase inhibitor. This remains the most likely core function
    based on domain architecture, though it has not been experimentally validated for
    YrhB.
  molecular_function:
    id: GO:0030414
    label: peptidase inhibitor activity
  directly_involved_in:
    - id: GO:0030153
      label: bacteriocin immunity
  locations:
    - id: GO:0005737
      label: cytoplasm
  supported_by:
    - reference_id: PMID:22731697
      supporting_text: >-
        Imm35 is specifically associated only with the papain-like peptide Tox-PL1,
        suggesting that it functions specifically as a peptidase inhibitor

proposed_new_terms: []

suggested_questions:
  - question: >-
      What is the cognate toxin for YrhB/Imm35 in E. coli K12? Is there a Tox-PL1-type
      toxin gene in the genomic neighborhood of yrhB (b3446)?
  - question: >-
      Is the chaperone-like activity reported by Ahn et al. (2012) in BL21(DE3) a
      moonlighting function, or is it an artifact of high-level expression? Does K12
      YrhB show the same activity?
  - question: >-
      Has the DeepECTF misprediction of EC 4.1.2.50 for YrhB been propagated into any
      databases?

suggested_experiments:
  - description: >-
      Test whether yrhB deletion in K12 affects susceptibility to polymorphic toxins
      from competing strains, particularly those encoding Tox-PL1-type toxin domains.
    hypothesis: >-
      If YrhB functions as an Imm35 immunity protein, a yrhB deletion mutant should
      be more susceptible to Tox-PL1 papain-like peptidase toxins from competing bacteria.
  - description: >-
      Examine the genomic neighborhood of yrhB (b3446) for adjacent toxin-encoding genes
      to identify the cognate toxin.
    hypothesis: >-
      Polymorphic toxin immunity genes are typically found immediately downstream of
      their cognate toxin gene.
  - description: >-
      Replicate the chaperone-like activity assays from Ahn et al. (2012) using purified
      K12 YrhB to determine if this is strain-specific to BL21(DE3).
    hypothesis: >-
      The chaperone-like activity may be a general property of YrhB or may be specific
      to BL21(DE3) expression conditions.