A small (87 aa) conserved hypothetical protein of unknown function in Mycobacterium tuberculosis H37Rv. Contains a single DUF2630 domain (Pfam PF10944, InterPro IPR020311) spanning nearly the entire protein. Detected at the protein level by mass spectrometry in whole cell lysates. Non-essential for in vitro growth. mRNA is upregulated after 96 hours of nutrient starvation, suggesting a possible role in stress adaptation, but no molecular function has been established for Rv0898c or any member of the DUF2630 family.
| GO Term | Evidence | Action | Reason |
|---|---|---|---|
|
GO:0003674
molecular_function
|
ND | NEW |
Summary: No molecular function has been established for Rv0898c or any member of the DUF2630 family. The InterPro entry IPR020311 explicitly states this entry contains proteins with no known function and has no GO term mappings.
Reason: The root molecular function term with ND (No biological Data) evidence is used to explicitly document that no molecular function annotation is supported for this protein. The BioReason SFT trace speculated about protein binding and CoA biosynthesis but these claims are entirely unsupported. A dedicated 3-iteration OpenScientist run (AlphaFold + Foldseek) reached the same conclusion. Although the DUF2630 fold is classifiable as a two-helix antiparallel hairpin (top Foldseek hit uL29 ribosomal protein at only 27% identity, twilight zone; CATH topology 1.10.287, shared by >600 functionally diverse superfamilies), the DUF2630-specific motif (CWDLLRQRR) has no match in any characterized protein, so no molecular function can be inferred and the ND annotation should be retained.
Supporting Evidence:
file:MYCTU/Rv0898c/Rv0898c-deep-research-bioreason-sft.md
file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
The current ND (no data) molecular function annotation should be retained.
file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
the DUF2630-specific conserved motif (CWDLLRQRR) has no matches in any characterized protein
|
|
GO:0005737
cytoplasm
|
IDA
PMID:21969609 Proteogenomic analysis of Mycobacterium tuberculosis by high... |
NEW |
Summary: Rv0898c protein was detected in whole cell lysates but not in membrane or culture filtrate fractions by mass spectrometry, consistent with cytoplasmic localization.
Reason: Proteomics data from high-resolution mass spectrometry detected Rv0898c in whole cell lysates but not in membrane or secreted fractions, supporting cytoplasmic localization.
Supporting Evidence:
PMID:21969609
we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count
|
Q: What is the three-dimensional structure of Rv0898c and does it reveal any structural similarity to proteins of known function?
Q: Does Rv0898c interact with any other M. tuberculosis proteins, particularly those involved in stress response or starvation survival?
Q: What is the phenotype of an Rv0898c knockout under starvation or in vivo infection conditions?
Experiment: Determine the crystal or NMR structure of Rv0898c and perform structural similarity searches (e.g. Dali, Foldseek) to identify distant functional homologs
Experiment: Perform affinity purification coupled with mass spectrometry (AP-MS) to identify interaction partners of Rv0898c in M. tuberculosis
Experiment: Construct an Rv0898c deletion mutant and test survival under nutrient starvation, hypoxia, and in macrophage infection models
Experiment: Perform transcriptomic analysis of Rv0898c knockout versus wild-type under starvation conditions to identify downstream effects
I begin with the domain architecture. The protein is dominated by a single InterPro signature, IPR020311 (Uncharacterised protein family, Rv0898c, residues 8–86). This compact, family-specific module spans essentially the entire polypeptide, implying a small, single-domain protein without appended catalytic or transport domains. The absence of recognizable enzymatic cores (e.g., acyltransferase HxH, ligase Lys/Asp–Pp, or Rossmann-like redox folds) and the lack of predicted transmembrane segments suggest a non-enzymatic role, most plausibly as a binding/adaptor module. Such small bacterial proteins frequently act as peripheral factors that dock onto larger machineries to modulate their assembly or activity. This architecture causally supports a molecular function centered on protein binding (GO:0005515).
From this binding-centric foundation, I infer the biological role by considering pathway context. Coenzyme A biosynthesis (GO:0015937) in bacteria proceeds through a sequence of soluble enzymes: acyl-AMP ligase (ACP/CoA ligase), phosphopantetheine kinase, dephospho-CoA kinase, and auxiliary salvage steps. A small adaptor would not catalyze these reactions but could regulate flux by stabilizing enzyme–enzyme assemblies, positioning the pathway near metabolite sources, or anchoring the soluble enzymes to the cell periphery where CoA and acyl-CoA pools are heavily utilized. The Rv0898c family’s compact, likely surface-exposed fold is well-suited to scaffold interactions rather than catalysis, which provides a mechanistic route to modulate the CoA pathway without directly performing the chemistry.
The cellular location follows from the same logic. A lack of transmembrane domains and the small size argue against an integral membrane role. However, binding to envelope components can tether soluble enzymes at the cell surface. The explicit association with a canonical mycobacterial outer envelope protein (OmpA) points to a peripheral position at the cell boundary. In mycobacteria, the peptidoglycan-based cell wall (GO:0009274) underlies the mycomembrane; a protein that bridges a porin-like envelope component to cytosolic enzymes would function at this interface. Thus, I deduce a peripheral localization at the peptidoglycan layer, where the protein can couple envelope physiology to central metabolism.
This leads to a mechanistic hypothesis. The Rv0898c fold provides a docking platform that binds both envelope proteins and metabolic enzymes. Association with OmpA would anchor the complex to the cell wall, while interaction with a probable oxidoreductase and CoA-pathway enzymes would position redox and thioester-handling reactions near the envelope, where acyl-CoA turnover is intense. By stabilizing transient multi-enzyme assemblies, the protein could enhance local CoA regeneration and channel intermediates, thereby tuning CoA biosynthesis and salvage to envelope demands. In summary, a small, single-domain binding protein localizes to the peptidoglycan interface and modulates CoA biosynthetic throughput by scaffolding partner proteins.
## Functional Summary
A small, single-domain peripheral protein in Mycobacterium tuberculosis that likely operates as a binding adaptor at the cell envelope. By docking to envelope components and soluble enzymes, it probably organizes or stabilizes multi-enzyme assemblies that tune coenzyme A production and salvage at the cell wall interface. This positioning would couple envelope metabolic demands to coenzyme pools, with interactions bridging an outer envelope protein and redox enzymes to modulate flux without direct catalysis.
## UniProt Summary
May be involved in the modulation of CoA biosynthesis.
## InterPro Domains
- IPR020311: Uncharacterised protein family, Rv0898c (family) [8-86]
## GO Term Predictions
### Molecular Function
### Biological Process
### Cellular Component
Verdict: Partially Supported
The seed hypothesis proposes that Rv0898c, an 87-amino-acid uncharacterized Mycobacterium tuberculosis protein whose sole domain is DUF2630 (IPR020311), can be assigned to a known structural superfamily that implies a specific candidate molecular function using AlphaFold structure prediction and Foldseek structural-homology searches. This hypothesis is partially supported: the fold classification component is robust, but the functional inference component fails. AlphaFold predicts a high-confidence two-helix antiparallel hairpin fold (mean pLDDT 84.6, core residues >90), and Foldseek identifies structural similarity to the uL29 ribosomal protein (Prob = 0.992). However, the matched CATH topology (1.10.287, Helix Hairpins) encompasses >600 functionally diverse superfamilies, sequence identity is in the twilight zone (27%), and the DUF2630-specific conserved motif (CWDLLRQRR) has no matches in any characterized protein. Published precedent demonstrates that fold-level classification of DUFs does not reliably transfer to specific molecular function when the binding site has diverged. The current ND (no data) molecular function annotation should be retained. Fold classification is a genuine and useful structural insight, but it is insufficient to justify a GO molecular function term without experimental evidence.
Key caveat: The fold classification is genuine and useful for structural biology. The limitation is in extrapolating from fold to function, which requires either higher sequence identity, shared active-site residues with a characterized superfamily, or experimental validation.
Rv0898c (UniProt P9WKP5) is a small, single-domain protein encoded on the minus strand of the M. tuberculosis H37Rv genome, adjacent to but operonically distinct from the well-characterized ompATb operon (Rv0899–Rv0901). Its only recognized domain is DUF2630 (Pfam PF10944 / InterPro IPR020311), a family of ~1,778 exclusively bacterial proteins with zero characterized members, zero GO mappings, and zero experimental structures. The seed hypothesis posits that structural prediction and remote homology searching can assign Rv0898c to a known superfamily with functional implications.
Our investigation confirmed the structural aspect: AlphaFold model AF-P9WKP5-F1 (v6) predicts two long α-helices (α1: residues 9–30; α2: residues 36–65) in an antiparallel hairpin arrangement with high confidence. Foldseek searches against PDB100 return the uL29 ribosomal protein from M. smegmatis (PDB 6dzi, chain Z) as the top hit, placing Rv0898c in CATH topology 1.10.287 (Mainly Alpha / Orthogonal Bundle / Helix Hairpins). However, this topology is one of the most common α-helical folds in nature, populated by hundreds of functionally unrelated superfamilies. The 27% sequence identity to uL29 falls within the "twilight zone" where homology cannot be reliably inferred from structure alone, and the DUF2630-specific conserved motif (LDQCWDLLRQRRA) — which defines a charged surface patch on helix α2 — has no counterpart in uL29 or any other characterized protein family. Published methodological studies confirm that fold recognition can correctly identify structure but fail to predict function when binding sites have diverged, as exemplified by the DUF388/OB-fold case (PMID: 15178340).
Genomic context analysis revealed that Rv0898c is co-oriented with Rv0897c (a 535-aa NAD(P)-binding oxidoreductase) on the minus strand, with a STRING interaction score of 0.851 — but this score derives entirely from genome neighborhood evidence with zero experimental, co-expression, or text-mining support. The adjacent ompATb operon functions in ammonia secretion and pH adaptation, providing tantalizing but indirect contextual clues. No direct functional evidence — biochemical, genetic, or interaction-based — exists for Rv0898c in the published literature.
The AlphaFold structural model (AF-P9WKP5-F1, v6) of Rv0898c reveals a simple architecture: two α-helices of 22 and 30 residues, respectively, arranged in an antiparallel hairpin with an inter-helix angle of approximately 17°, connected by a short turn (residues 31–35). The N-terminus (residues 1–8) and C-terminus (residues 66–87) are predicted as disordered. Confidence metrics are strong: mean pLDDT of 84.6 overall and >90 for the structured core (residues 9–65), indicating "very high confidence" in the predicted fold.
Foldseek structural similarity searches against PDB100 returned the uL29 ribosomal protein from Mycobacterium smegmatis (PDB 6dzi, chain Z) as the top hit with Prob = 0.992, E-value = 0.33, sequence identity = 27.2%, and alignment length = 66 residues. Multiple additional hits in the CATH50 database map to topology 1.10.287 (Helix Hairpins), confirming the fold classification. Rv0898c itself has no entry in CATH v4.3.0, so this represents a new structural classification for the DUF2630 family.
{{figure:rv0898c_analysis.png|caption=Comprehensive visualization of Rv0898c structure, conservation analysis, and key motifs. The protein adopts a two-helix antiparallel hairpin fold with a conserved charged surface patch defined by the CWDLLRQRR motif.}}
Rv0898c is encoded on the minus strand at position complement(1002441..1002704), while the ompATb operon genes (Rv0899/ompATb, Rv0900/arfB, Rv0901/arfC) are on the plus strand. This opposite-strand orientation rules out co-transcription. The operon structure of Rv0899–Rv0901 is experimentally confirmed: "the ompATb gene (Rv0899), encoding a major outer membrane protein, is organized in operon with Rv0900 and Rv0901, encoding two small proteins with a predicted transmembrane domain" (PMID: 21802366). Rv0898c is instead co-oriented with the upstream Rv0897c on the minus strand, suggesting possible co-transcription with this NAD(P)-binding oxidoreductase.
InterPro entry IPR020311 describes DUF2630 as "proteins with no known function," encompassing 1,778 protein members across 2,119 exclusively bacterial taxa. The family has zero GO terms mapped, zero experimental structures in the PDB, and 1,046 AlphaFold structural models. UniProt entry P9WKP5 for Rv0898c contains zero GO annotations and zero functional comments. The Pfam domain PF10944 spans residues 8–86 with an E-value of 1.7e-29, covering essentially the entire mature protein. This complete absence of functional data across all family members means there are no transfer-by-homology opportunities and no experimental anchor points for function prediction.
Analysis of 19 diverse DUF2630 family members across Actinobacteria reveals a core conserved motif LDQCWDLLRQRRA (Rv0898c residues 51–63) with five absolutely conserved positions (W55, D52, D56, L57, R59, R61 at 100%) and four near-invariant positions (L51, Q53, L58, A63 at 89–95%). The CWDLLR hexapeptide is unique to DUF2630 — zero hits were found in non-DUF2630 SwissProt entries.
Three-dimensional mapping of these conserved residues reveals a bipartite functional architecture:
- Structural core: Buried hydrophobic residues (L51, C54, W55, L57, L58) form the helix-helix interface. W55's indole ring contacts I16 and L13 on helix α1 (distances 3.85–4.63 Å), and R61-NH1 forms a 2.82 Å salt bridge to D9-OD1 on helix α1.
- Surface patch: Exposed charged residues (D52, D56, R59, R61, R62) define a conserved surface that could mediate protein–protein or protein–ligand interactions.
The absolute conservation of this motif across all DUF2630 members, combined with its surface exposure and charge complementarity, strongly suggests it mediates a conserved but as-yet-unknown binding interaction. Critically, this motif has no structural or sequence counterpart in uL29 or any other characterized protein, blocking functional inference from the fold match.
The adjacent ompATb operon is functionally characterized: "the proteins encoded by the ompATb operon are involved in generating a rapid ammonia burst, which neutralized medium pH and preceded exponential growth of M. tuberculosis" (PMID: 21410778). Furthermore, "Rv0899-like proteins are widespread in bacteria with functions in nitrogen metabolism, adaptation to nutrient poor environments, and/or establishing symbiosis with the host organism" (PMID: 21905117).
STRING functional enrichment groups the Rv0897c–Rv0901 cluster as "Mixed, incl. FAD/NAD(P)-binding domain superfamily" (FDR = 2.63e-11). Rv0898c is non-essential in vitro based on TnSeq data from the MtbTnDB, and DUF2630 taxonomic distribution is restricted primarily to Actinobacteria with some representatives in Nitrospira and Betaproteobacteria — lineages that include nitrogen-cycling organisms.
While this genomic context is suggestive, it provides only guilt-by-association evidence. No direct functional, biochemical, or genetic data connects Rv0898c to ammonia secretion, nitrogen metabolism, or any other biological process.
STRING reports a high combined interaction score of 0.851 for Rv0897c–Rv0898c, but decomposition reveals this derives entirely from genome neighborhood evidence (0.847) with scores of zero for gene fusion, co-occurrence, experimental data, co-expression, and text mining. Rv0897c itself (UniProt P9WKP7) is a 535-aa membrane-associated protein with a validated NAD_binding_8 domain (PF13450), classified in the FAD/NAD(P)-binding domain superfamily (IPR036188). While co-transcription of Rv0897c and Rv0898c is plausible given their co-orientation on the minus strand, no experimental evidence confirms physical interaction, functional coupling, or shared pathway membership.
Three key methodological precedents inform the interpretation of our structural findings:
ColabFold + Foldseek workflow validation (PMID: 38166563): Svedberg et al. (2024) demonstrated that this workflow "increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools" for microsporidian genomes. This validates the computational approach used here.
Limits of structural classification for DUFs (PMID: 41288334): Pei et al. (2025) investigated 664 candidate novel fold domains from the TED database, creating "190 new Pfam families, many classified as domains of unknown function (DUFs)" — demonstrating that even advanced structural classification frequently fails to resolve function.
DUF388/OB-fold precedent (PMID: 15178340): Ginalski et al. (2004) assigned DUF388 to the OB-fold by fold recognition but predicted the proteins would "probably lack nucleic acid-binding properties as implied by the analysis of the potential binding site." This case directly parallels Rv0898c/uL29: correct fold identification but divergent binding sites that block functional transfer.
| # | Citation | Evidence Type | Direction | Claim Tested | Key Finding | Context | Confidence |
|---|---|---|---|---|---|---|---|
| 1 | AlphaFold DB (AF-P9WKP5-F1-v6) | Computational (structure prediction) | Supports fold classification | Rv0898c has a defined structural fold | Two-helix antiparallel hairpin, mean pLDDT 84.6, core >90 | In silico; 87 aa protein | High for fold; prediction only |
| 2 | Foldseek vs PDB100 | Computational (structural homology) | Supports topology assignment | Fold matches a known CATH topology | Top hit: uL29 (PDB 6dzi_Z), Prob=0.992, E=0.33, SeqId=27.2% | M. smegmatis ribosome structure | Moderate; topology yes, superfamily uncertain |
| 3 | Foldseek vs CATH50 | Computational (structural classification) | Supports topology, qualifies function | Fold belongs to 1.10.287 (Helix Hairpins) | Multiple hits in CATH 1.10.287.* (various superfamilies) | Structural classification database | High for topology; uninformative for function |
| 4 | UniProt P9WKP5 | Database | Supports ND status | Current annotation is appropriate | Zero GO terms, zero functional comments | Swiss-Prot reviewed entry | High |
| 5 | InterPro IPR020311 / Pfam PF10944 | Database | Supports uncharacterized status | DUF2630 family characterization | 1,778 members across 2,119 taxa; zero GO mappings; zero characterized members | InterPro/Pfam | High |
| 6 | DUF2630 MSA (19 species) | Computational (conservation) | Supports structural fold; neutral for function | Conserved residues define a binding site | LDQCWDLLRQRRA motif (5 residues 100% conserved); bipartite structural/surface architecture | Cross-family; Actinobacteria-dominant | Moderate; motif is real but function unclear |
| 7 | UniProt CWDLLR search | Computational (motif search) | Qualifies | CWDLLR matches a known functional motif | Zero matches in non-DUF2630 Swiss-Prot | All reviewed UniProt entries | High; motif is novel |
| 8 | PMID: 21802366 | Direct experiment | Qualifies genomic context | Rv0898c is part of ompATb operon | Rv0899-0901 form the ompATb operon (plus strand); Rv0898c on minus strand = NOT in operon | M. bovis BCG, operon mapping | High |
| 9 | PMID: 21410778 | Direct experiment | Contextual | Adjacent operon function is known | ompATb operon enables ammonia secretion for acid adaptation | M. tuberculosis H37Rv | High for operon; no data on Rv0898c |
| 10 | PMID: 21905117 | Computational + review | Contextual | Genomic neighborhood functional theme | Rv0899-like proteins widespread in nitrogen-fixing bacteria | Bioinformatics survey | Moderate |
| 11 | STRING DB v12 | Computational (genomic context) | Supports co-transcription with Rv0897c | Rv0898c functionally linked to neighbors | Rv0897c-Rv0898c: score 0.851 (neighborhood only = 0.847); experimental = 0 | STRING v12 | Low for physical interaction |
| 12 | TnSeq data | Direct experiment (essentiality) | Neutral | Rv0898c is essential for growth | Rv0898c is non-essential in standard in vitro growth | H37Rv, 7H10 medium | High for in vitro non-essentiality |
| 13 | PMID: 25467293 | Computational (remote homology) | Contextual | M. tuberculosis proteome annotation | 95% of Mtb proteins annotated; 183 mycobacteria-unique unknowns remain; Rv0898c in recalcitrant set | Systematic bioinformatics | Moderate |
| 14 | PMID: 38166563 | Computational (method validation) | Supports approach | AlphaFold+Foldseek improves annotation | "Increased accuracy and quality of functional genome annotation" | Microsporidian genomes | Moderate; validates approach |
| 15 | PMID: 41288334 | Computational (structural classification) | Qualifies | Structural data resolves all DUFs | "190 new Pfam families, many classified as DUFs"; structural data alone insufficient | TED database analysis | High |
| 16 | PMID: 15178340 | Computational (fold recognition) | Supports partial verdict | Fold assignment implies function | DUF388 → OB-fold but "probably lack nucleic acid-binding properties" due to diverged binding site | BOF family analysis | High; direct precedent |
| GO Aspect | Recommendation | Rationale |
|---|---|---|
| MF (Molecular Function) | Retain ND | No experimental data; fold match insufficient for function transfer; unique conserved motif blocks homology-based inference |
| BP (Biological Process) | Retain ND | Genomic context suggestive but indirect; no genetic or biochemical evidence |
| CC (Cellular Component) | Retain ND | No localization data; Rv0897c is membrane-associated but no evidence extends to Rv0898c |
GO:0003735 (structural constituent of ribosome): Despite structural similarity to uL29, the 27% sequence identity is in the twilight zone, the E-value (0.33) is marginal, M. tuberculosis has its own bona fide uL29 (rpmC, Rv0709), and no evidence links Rv0898c to ribosomes. Assigning this term would be over-annotation.
GO:0005515 (protein binding): While the conserved surface charges suggest protein–protein interaction capability, no binding partner has been identified. This term is uninformative without partner identification and should not be assigned based on surface charge alone.
GO:0016491 (oxidoreductase activity): Sometimes transferred from genomic neighbors; inappropriate without direct evidence for Rv0898c.
| Scenario | Candidate GO Term | Evidence Type |
|---|---|---|
| Rv0898c shown to interact with Rv0897c | GO:0005515 (protein binding) | IPI |
| Rv0898c has regulatory/structural role for oxidoreductase | Specific MF term based on demonstrated activity | IDA/IMP |
| Expression profiling links to specific process | Relevant BP term | IEP |
| Localization demonstrated | Relevant CC term | IDA |
No direct molecular function has been demonstrated or reliably predicted for Rv0898c. The protein folds into a two-helix hairpin with a conserved charged surface patch (D52, D56, R59, R61, R62), which is consistent with a protein–protein interaction surface, a small-molecule binding site, or a structural/scaffolding role. However, none of these possibilities can be distinguished without experimental data.
The small size (87 aa), simple fold (two helices), and absence of any catalytic residue signatures argue against enzymatic activity. The protein is more likely to function as:
- A protein–protein interaction adapter or modulator
- A structural component of a macromolecular complex
- A small regulatory protein (e.g., anti-toxin, transcription co-factor)
The pairing of a small helical protein with a larger enzyme (Rv0897c, 535 aa NAD-binding oxidoreductase) on the same strand is a common genomic pattern in bacteria, where the small protein may serve as a regulatory subunit, chaperone/assembly factor, redox partner mediator, or co-factor delivery protein.
The ammonia secretion / acid adaptation phenotype of the adjacent ompATb operon (Rv0899–0901) should NOT be attributed to Rv0898c. The operon is on the opposite strand and has been functionally characterized independently (PMID: 21410778, PMID: 21802366). Any functional connection to nitrogen metabolism or pH adaptation would need to be established through independent evidence, not assumed from genomic proximity. Similarly, the predicted interaction with Rv0897c (oxidoreductase) is based solely on genome neighborhood and cannot be treated as evidence for shared pathway membership.
The seed hypothesis assumes that structural fold classification implies a candidate molecular function. This is a common and often productive approach, but it has known limitations for simple/common folds. The helix-hairpin topology (CATH 1.10.287) is among the most promiscuous topologies in protein structure space, found in ribosomal proteins, transcription factors, membrane-associated proteins, nucleases, vesicle proteins, and ESCRT components. Fold-to-function transfer requires either: (a) superfamily-level homology (sequence identity >30% and/or shared conserved active site), or (b) shared functional motifs with characterized proteins. Neither condition is met for Rv0898c.
The strongest structural hit (uL29, Prob = 0.992) could be misinterpreted as evidence for ribosomal function. Several factors argue against this:
- M. tuberculosis already encodes its own bona fide uL29 (rpmC, Rv0709, P9WHA7)
- Sequence identity (27%) is in the twilight zone
- The Foldseek E-value (0.33) is marginal for confident homology
- uL29 proteins have specific rRNA-binding features not conserved in Rv0898c
- The DUF2630-specific CWDLLR motif has no counterpart in uL29 family proteins
- No co-expression or co-occurrence with ribosomal genes
The high STRING score (0.851) for Rv0897c–Rv0898c might suggest robust functional coupling with this oxidoreductase. However, decomposition reveals 100% of the score derives from genome neighborhood (0.847) with zero from experimental interaction, co-expression, text mining, gene fusion, or co-occurrence channels. While genome neighborhood is a moderately reliable predictor in bacteria, the complete absence of orthogonal evidence types means this prediction should be treated with caution.
Small proteins (60–120 aa) with helix-hairpin folds in M. tuberculosis are frequently components of toxin-antitoxin (TA) systems. The DUF2630 size and fold are consistent with a Type II antitoxin, and M. tuberculosis harbors an unusually large repertoire of TA systems (PMID: 30476068). However, no TA system has been identified at the Rv0898c locus, and the DUF2630 conservation pattern (restricted to Actinobacteria, not mobile-element associated) does not match typical TA distribution patterns.
The most parsimonious alternative hypothesis is that Rv0898c serves as a protein–protein interaction module for Rv0897c. This is supported by: (a) likely co-transcription (same strand, adjacent), (b) STRING genome neighborhood score, (c) common small-protein/large-enzyme pairing pattern, (d) conserved surface charges. However, Rv0897c itself is uncharacterized (UniProt: "Uncharacterized protein"; only validated domain: NAD_binding_8; predicted membrane-associated with TM helices), so even confirming this interaction would not directly resolve Rv0898c's molecular function.
| Gap | What Was Checked | Why It Matters | Resolution |
|---|---|---|---|
| No experimental structure for any DUF2630 member | PDB, AlphaFold DB, InterPro structures | AlphaFold predictions are high-confidence but unvalidated | X-ray/cryo-EM/NMR structure of Rv0898c or any DUF2630 member |
| No binding partners known | STRING (computational only), BioGRID (no entries), literature | Cannot assign MF without knowing what the protein binds | AP-MS, bacterial two-hybrid, or crosslinking-MS in M. tuberculosis |
| No genetic phenotype under relevant conditions | TnSeq: non-essential in vitro | In vivo role may differ; conditional essentiality unknown | Delete Rv0898c and test under stress, in macrophages, or in animals |
| No transcriptomic context for Rv0898c | Literature search; no expression data found | Expression conditions could reveal function | RNA-seq under diverse conditions; mine MtbTnDB conditional screens |
| Rv0897c co-transcription unverified | Inferred from strand orientation only | Would strengthen or weaken functional linkage | RT-PCR or RNA-seq to map transcript boundaries |
| CWDLLR motif function unknown | Searched SwissProt, PDB, InterPro, PROSITE | Most conserved and distinctive feature of DUF2630 | Alanine scanning mutagenesis of D52, D56, R59, R61, R62 |
| DUF2630 phylogenetic scope incomplete | InterPro: Actinobacteria-dominant, some Nitrospira/Betaproteobacteria | Narrow distribution may indicate specialized function | Comprehensive phylogenomic analysis across all bacterial phyla |
| C54 functional role unclear | 89% conserved in MSA; buried in helix interface | If C54 is redox-active, it constrains function hypothesis | C54A mutagenesis; test with/without Rv0897c |
| Rv0897c function unknown | UniProt P9WKP7: "Uncharacterized"; NAD_binding_8 domain | If substrate is known, Rv0898c's role might be inferred | Biochemical characterization of Rv0897c; substrate identification |
These experiments would directly test whether the conserved charged surface mediates protein–protein interactions and identify potential partners, enabling GO MF annotation.
This would determine whether the conserved surface patch is functionally important and separate structural from functional roles.
This would reveal whether Rv0898c's non-essentiality in standard culture masks a condition-specific role, particularly related to the neighboring ompATb operon's function in pH adaptation.
| Reference | Relevant Snippet | Use |
|---|---|---|
| PMID: 21802366 | "the ompATb gene (Rv0899)...is organized in operon with Rv0900 and Rv0901" | Confirms Rv0898c is NOT part of ompATb operon |
| PMID: 21410778 | "proteins encoded by the ompATb operon are involved in generating a rapid ammonia burst" | Contextual: adjacent operon function |
| PMID: 21905117 | "Rv0899-like proteins are widespread in bacteria with functions in nitrogen metabolism" | Contextual: genomic neighborhood theme |
| PMID: 38166563 | "increased the accuracy and quality of functional genome annotation" | Methodological: validates Foldseek approach |
| PMID: 41288334 | "190 new Pfam families, many classified as domains of unknown function (DUFs)" | Methodological: structural data ≠ function |
| PMID: 15178340 | "probably lack nucleic acid-binding properties as implied by the analysis of the potential binding site" | Direct precedent: fold ≠ function when binding site diverges |
PMID: 21802366 — Song et al. Confirms ompATb operon structure (Rv0899–0901) and shows dependence on small membrane proteins. Establishes that Rv0898c on the opposite strand is NOT part of this operon.
PMID: 21410778 — Sartain et al. Demonstrates ompATb operon role in ammonia secretion and pH adaptation. Provides functional context for the genomic neighborhood but no direct evidence for Rv0898c.
PMID: 21905117 — Teriete et al. Shows Rv0899-like proteins are widespread in nitrogen-fixing bacteria, broadening the functional context of the genomic region.
PMID: 20199110 — Teriete et al. Structural characterization of Rv0899 (OmpATb) reveals mixed α/β structure with BON domain. Important for understanding the neighboring protein but provides no information about Rv0898c.
PMID: 38166563 — Svedberg et al. (2024). Validates ColabFold + Foldseek workflow for functional annotation improvement, supporting the computational approach used here.
PMID: 41288334 — Pei et al. (2025). Large-scale investigation finding that many structurally classified domains remain DUFs, directly supporting our conclusion that fold assignment alone is insufficient for functional annotation.
PMID: 15178340 — Ginalski et al. (2004). DUF388 assigned to OB-fold by fold recognition but predicted to lack nucleic acid-binding properties. Direct precedent for the Rv0898c/uL29 situation where fold is correctly identified but function cannot be transferred.
PMID: 25467293 — Tyagi & Bhatt (2014). Remote homology detection enriched Mtb proteome annotation to 95% coverage, with 183 mycobacteria-unique unknowns remaining. Rv0898c/DUF2630 is among this recalcitrant set.
PMID: 22301074 — Burge et al. (2012). Describes InterPro's protocol for GO annotation of protein signatures, including the challenges of mapping GO terms to DUF families with no characterized members.
PMID: 40527579 — MtbTnDB: comprehensive transposon sequencing database for M. tuberculosis. Rv0898c is non-essential under standard in vitro conditions but conditional essentiality across ~150 conditions has not been systematically examined in this investigation.
No experimental validation: All structural analyses are based on AlphaFold predictions. While confidence metrics are high (pLDDT >90 for core), predicted structures can miss ligand-induced conformational changes, oligomerization states, and post-translational modifications.
Foldseek statistical significance: The top hit (uL29, E-value = 0.33) is at the margin of statistical significance. While the probability score (0.992) is high, the functional diversity of the matched topology fundamentally limits interpretive power.
Conservation analysis scope: Our multiple sequence alignment included 19 diverse DUF2630 members from Actinobacteria. A broader analysis including all 1,778 family members might reveal additional conserved features, subfamily structure, or outlier members with functional clues.
Genome neighborhood interpretation: Co-localization does not equal co-function, especially for small intergenic proteins. The STRING score decomposition (100% from neighborhood) underscores this limitation.
Literature absence: No primary literature directly studying Rv0898c was found despite comprehensive PubMed searching. All contextual information derives from studies of neighboring genes or computational surveys. This absence itself is informative — the protein has attracted no experimental attention.
Negative data bias: The non-essential designation from TnSeq reflects only standard in vitro growth conditions. Many virulence-related and stress-response genes appear non-essential in standard screens but are critical during infection or under specific stresses.
Single-organism focus: Our analysis focused primarily on the M. tuberculosis H37Rv context. DUF2630 members in other organisms (Nitrospira, Betaproteobacteria) might have different genomic contexts that provide additional functional clues.
Rv0898c encodes a small (87 aa, 9.9 kDa) conserved hypothetical protein in Mycobacterium tuberculosis H37Rv. It is classified as a "conserved hypothetical protein" by MycoBrowser. The protein contains a single domain of unknown function, DUF2630 (Pfam PF10944), corresponding to InterPro family IPR020311. The InterPro entry states: "This entry contains proteins with no known function" and has no GO term mappings (interpro2go is empty for IPR020311). There are ZERO curated GO annotations in GOA for this protein.
Rv0898c is located at position 1002441 bp on the minus strand. Its immediate downstream neighbor on the chromosome is Rv0899 (ompA/arfA), which is on the plus strand starting at 1002812 bp. Rv0899 encodes a membrane protein that is part of the arf (ammonia release facilitator) operon (Rv0899-Rv0900-Rv0901). Importantly, Rv0898c is transcribed in the OPPOSITE direction from Rv0899 and is NOT part of this operon.
The Rv0899 operon is well characterized: "the rv0899 gene is part of an operon (rv0899-rv0901) that is required for fast ammonia secretion, pH neutralization and growth of M. tuberculosis in acidic environments" [PMID:21905117, DOI:10.1002/prot.23151]. However, Rv0898c, being on the complementary strand, is not part of this operon and its proximity to Rv0899 does not imply functional linkage.
MycoBrowser notes that Rv0898c is "highly similar to CAC01589.1|AL391041 hypothetical protein from Streptomyces coelicolor" and shows some homology to Rv0709, another M. tuberculosis protein. The DUF2630 family (IPR020311) contains 1,938 proteins across 2,221 taxa, all of unknown function. No member of this family has been functionally characterized.
Source: Rv0898c-deep-research-bioreason-sft.md
The BioReason functional summary states:
A small, single-domain peripheral protein in Mycobacterium tuberculosis that likely operates as a binding adaptor at the cell envelope. By docking to envelope components and soluble enzymes, it probably organizes or stabilizes multi-enzyme assemblies that tune coenzyme A production and salvage at the cell wall interface. This positioning would couple envelope metabolic demands to coenzyme pools, with interactions bridging an outer envelope protein and redox enzymes to modulate flux without direct catalysis.
This summary is almost entirely unsupported by any evidence:
"binding adaptor at the cell envelope": There is no evidence that Rv0898c is a binding adaptor or localizes to the cell envelope. Proteomics data from MycoBrowser indicates the protein was detected in whole cell lysates but NOT in membrane fractions, arguing against an envelope localization. The protein has no transmembrane domains, no signal peptide, and no predicted membrane-association motifs. BioReason correctly notes the absence of transmembrane domains but then contradicts itself by proposing envelope localization.
"coenzyme A production and salvage": There is no evidence linking Rv0898c to CoA biosynthesis. The BioReason summary appears to have confabulated this entirely. The "UniProt Summary" section of the BioReason output states "May be involved in the modulation of CoA biosynthesis" but this text does not appear in the actual UniProt entry for P9WKP5, which simply says "Uncharacterized protein Rv0898c" with no functional annotation whatsoever. This appears to be a hallucinated UniProt summary that the model then used as an anchor for its reasoning.
"association with OmpA": The thinking trace states "The explicit association with a canonical mycobacterial outer envelope protein (OmpA) points to a peripheral position at the cell boundary." There is no documented association between Rv0898c and OmpA (Rv0899). While Rv0898c is genomically adjacent to Rv0899, Rv0898c is on the OPPOSITE strand (hence the "c" suffix) and is NOT part of the Rv0899-Rv0900-Rv0901 operon. The Rv0899 operon is well characterized and does not include Rv0898c. This is a critical error where genomic proximity was incorrectly interpreted as functional association.
"interaction with a probable oxidoreductase": No oxidoreductase interaction has been documented for Rv0898c. This claim appears to be fabricated.
What BioReason gets partially right: The summary correctly identifies that Rv0898c is a small, single-domain protein without enzymatic domains, and that it lacks transmembrane segments. These are factual observations from the InterPro annotation.
Root cause of errors: BioReason appears to have constructed an elaborate narrative from three inputs: (a) the DUF2630 domain (which tells us nothing functional), (b) genomic proximity to Rv0899/OmpA (which is on the opposite strand and functionally unrelated), and (c) what appears to be a hallucinated UniProt summary about CoA biosynthesis. The model produced a confident-sounding but entirely speculative functional prediction for a genuinely uncharacterized protein. This represents a failure mode where the model cannot express "unknown" and instead fabricates plausible-sounding biology.
Comparison with interpro2go:
IPR020311 (the sole InterPro entry for Rv0898c) has NO GO term mappings in interpro2go. The InterPro entry explicitly states: "This entry contains proteins with no known function." Therefore there are no interpro2go annotations to compare against. BioReason is not recapitulating interpro2go here -- it is going far beyond the available evidence by fabricating a detailed functional narrative where interpro2go correctly assigns nothing. In this case, the conservative interpro2go approach (assigning no terms) is more accurate than BioReason's speculative narrative.
The thinking trace reveals the reasoning chain that led to errors:
The trace starts reasonably by noting the single-domain architecture and absence of enzymatic cores, correctly concluding a non-enzymatic role is likely. However, the leap to "protein binding (GO:0005515)" is generic and unsupported.
The trace then fabricates a connection to CoA biosynthesis with no stated evidence: "Coenzyme A biosynthesis (GO:0015937) in bacteria proceeds through a sequence of soluble enzymes" -- while true as a general statement, nothing connects this pathway to Rv0898c.
The statement "The explicit association with a canonical mycobacterial outer envelope protein (OmpA)" misrepresents genomic adjacency as functional association. Rv0898c is on the complement strand, divergently transcribed from Rv0899, and not part of the arf operon.
The BioReason model produced no GO term predictions in the "GO Term Predictions" section (all three subsections -- MF, BP, CC -- are empty), which paradoxically is more appropriate than the narrative functional summary.
The model demonstrates a systematic inability to express uncertainty for genuinely uncharacterized proteins, instead generating a plausible-sounding but fabricated functional story.
id: P9WKP5
gene_symbol: Rv0898c
product_type: PROTEIN
status: DRAFT
taxon:
id: NCBITaxon:83332
label: Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
description: >-
A small (87 aa) conserved hypothetical protein of unknown function in
Mycobacterium tuberculosis H37Rv. Contains a single DUF2630 domain
(Pfam PF10944, InterPro IPR020311) spanning nearly the entire protein.
Detected at the protein level by mass spectrometry in whole cell lysates.
Non-essential for in vitro growth. mRNA is upregulated after 96 hours
of nutrient starvation, suggesting a possible role in stress adaptation,
but no molecular function has been established for Rv0898c or any member
of the DUF2630 family.
references:
- id: PMID:9634230
title: Deciphering the biology of Mycobacterium tuberculosis from the complete
genome sequence
findings:
- statement: Rv0898c was identified in the complete genome sequence of M.
tuberculosis H37Rv
supporting_text: The genome comprises 4,411,529 base pairs, contains around
4,000 genes, and has a very high guanine + cytosine content
- id: PMID:21969609
title: Proteogenomic analysis of Mycobacterium tuberculosis by high resolution
mass spectrometry
findings:
- statement: Rv0898c protein was detected by high-resolution mass
spectrometry, confirming it is expressed at the protein level
supporting_text: we identified 3176 proteins from Mycobacterium tuberculosis
representing ~80% of its total predicted gene count
- id: PMID:28096490
title: Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis
Genome via Saturating Transposon Mutagenesis
findings:
- statement: Rv0898c is non-essential for in vitro growth of M. tuberculosis
based on saturating transposon mutagenesis
supporting_text: This work provides an authoritative catalog of essential
regions of the M. tuberculosis genome
- id: PMID:11929527
title: Evaluation of a nutrient starvation model of Mycobacterium tuberculosis
persistence by gene and protein expression profiling
findings:
- statement: Starvation induces a persistence-like state in M. tuberculosis
with upregulation of specific gene sets
supporting_text: we have generated a model with which we can search for
agents active against persistent M. tuberculosis and revealed a number of
potential targets expressed under these conditions
- id: PMID:21905117
title: Mycobacterium tuberculosis Rv0899 defines a family of membrane proteins
widespread in nitrogen-fixing bacteria
findings:
- statement: The neighboring gene Rv0899 is part of the arf operon
(Rv0899-Rv0901) required for ammonia secretion, but Rv0898c is on the
opposite strand and is NOT part of this operon
supporting_text: the rv0899 gene is part of an operon (rv0899-rv0901) that
is required for fast ammonia secretion, pH neutralization, and growth of M
- id: file:MYCTU/Rv0898c/Rv0898c-notes.md
title: Research notes for Rv0898c
findings:
- statement: Rv0898c mRNA is upregulated after 96h of starvation per
MycoBrowser annotation
- statement: Rv0898c is detected in whole cell lysates but not in culture
filtrate or membrane fractions
- statement: No publications in PubMed specifically about Rv0898c function
exist
- id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
title: 'OpenScientist hypothesis run: Rv0898c DUF2630 fold discovery'
findings:
- statement: Compute-driven AlphaFold + Foldseek analysis classified the DUF2630
fold as a two-helix antiparallel hairpin (weak uL29 ribosomal-protein similarity
at 27% identity, twilight zone) but could not infer a molecular function, because
the CATH 1.10.287 topology spans >600 unrelated superfamilies and the DUF2630
motif (CWDLLRQRR) has no characterized match. Supports retaining the ND annotation.
supporting_text: The current ND (no data) molecular function annotation should
be retained.
existing_annotations:
- term:
id: GO:0003674
label: molecular_function
evidence_type: ND
review:
summary: No molecular function has been established for Rv0898c or any
member of the DUF2630 family. The InterPro entry IPR020311 explicitly
states this entry contains proteins with no known function and has no GO
term mappings.
action: NEW
reason: The root molecular function term with ND (No biological Data)
evidence is used to explicitly document that no molecular function
annotation is supported for this protein. The BioReason SFT trace
speculated about protein binding and CoA biosynthesis but these claims are
entirely unsupported. A dedicated 3-iteration OpenScientist run (AlphaFold +
Foldseek) reached the same conclusion. Although the DUF2630 fold is classifiable
as a two-helix antiparallel hairpin (top Foldseek hit uL29 ribosomal protein at
only 27% identity, twilight zone; CATH topology 1.10.287, shared by >600
functionally diverse superfamilies), the DUF2630-specific motif (CWDLLRQRR) has
no match in any characterized protein, so no molecular function can be inferred
and the ND annotation should be retained.
supported_by:
- reference_id: file:MYCTU/Rv0898c/Rv0898c-deep-research-bioreason-sft.md
supporting_text: ''
- reference_id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
supporting_text: The current ND (no data) molecular function annotation should
be retained.
- reference_id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
supporting_text: the DUF2630-specific conserved motif (CWDLLRQRR) has no matches
in any characterized protein
- term:
id: GO:0005737
label: cytoplasm
evidence_type: IDA
original_reference_id: PMID:21969609
review:
summary: Rv0898c protein was detected in whole cell lysates but not in
membrane or culture filtrate fractions by mass spectrometry, consistent
with cytoplasmic localization.
action: NEW
reason: Proteomics data from high-resolution mass spectrometry detected
Rv0898c in whole cell lysates but not in membrane or secreted fractions,
supporting cytoplasmic localization.
supported_by:
- reference_id: PMID:21969609
supporting_text: we identified 3176 proteins from Mycobacterium
tuberculosis representing ~80% of its total predicted gene count
core_functions:
- description: >-
Rv0898c is an uncharacterized protein with no experimentally determined
molecular function. The DUF2630 domain family (IPR020311) has no GO
mappings and no characterized members across any organism. The protein
is small (87 aa), soluble (no transmembrane domains, detected in
cytoplasmic fraction), and upregulated under starvation conditions,
which may suggest a role in nutrient stress adaptation. However, no
specific molecular function, binding partners, or biological process
involvement has been demonstrated.
locations:
- id: GO:0005737
label: cytoplasm
supported_by:
- reference_id: PMID:21969609
supporting_text: we identified 3176 proteins from Mycobacterium tuberculosis
representing ~80% of its total predicted gene count
suggested_questions:
- question: What is the three-dimensional structure of Rv0898c and does it
reveal any structural similarity to proteins of known function?
- question: Does Rv0898c interact with any other M. tuberculosis proteins,
particularly those involved in stress response or starvation survival?
- question: What is the phenotype of an Rv0898c knockout under starvation or in
vivo infection conditions?
suggested_experiments:
- description: Determine the crystal or NMR structure of Rv0898c and perform
structural similarity searches (e.g. Dali, Foldseek) to identify distant
functional homologs
- description: Perform affinity purification coupled with mass spectrometry
(AP-MS) to identify interaction partners of Rv0898c in M. tuberculosis
- description: Construct an Rv0898c deletion mutant and test survival under
nutrient starvation, hypoxia, and in macrophage infection models
- description: Perform transcriptomic analysis of Rv0898c knockout versus
wild-type under starvation conditions to identify downstream effects