Rv0898c

UniProt ID: P9WKP5
Organism: Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
Review Status: DRAFT
📝 Provide Detailed Feedback

Gene Description

A small (87 aa) conserved hypothetical protein of unknown function in Mycobacterium tuberculosis H37Rv. Contains a single DUF2630 domain (Pfam PF10944, InterPro IPR020311) spanning nearly the entire protein. Detected at the protein level by mass spectrometry in whole cell lysates. Non-essential for in vitro growth. mRNA is upregulated after 96 hours of nutrient starvation, suggesting a possible role in stress adaptation, but no molecular function has been established for Rv0898c or any member of the DUF2630 family.

Existing Annotations Review

GO Term Evidence Action Reason
GO:0003674 molecular_function
ND NEW
Summary: No molecular function has been established for Rv0898c or any member of the DUF2630 family. The InterPro entry IPR020311 explicitly states this entry contains proteins with no known function and has no GO term mappings.
Reason: The root molecular function term with ND (No biological Data) evidence is used to explicitly document that no molecular function annotation is supported for this protein. The BioReason SFT trace speculated about protein binding and CoA biosynthesis but these claims are entirely unsupported. A dedicated 3-iteration OpenScientist run (AlphaFold + Foldseek) reached the same conclusion. Although the DUF2630 fold is classifiable as a two-helix antiparallel hairpin (top Foldseek hit uL29 ribosomal protein at only 27% identity, twilight zone; CATH topology 1.10.287, shared by >600 functionally diverse superfamilies), the DUF2630-specific motif (CWDLLRQRR) has no match in any characterized protein, so no molecular function can be inferred and the ND annotation should be retained.
Supporting Evidence:
file:MYCTU/Rv0898c/Rv0898c-deep-research-bioreason-sft.md
file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
The current ND (no data) molecular function annotation should be retained.
file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
the DUF2630-specific conserved motif (CWDLLRQRR) has no matches in any characterized protein
GO:0005737 cytoplasm
IDA
PMID:21969609
Proteogenomic analysis of Mycobacterium tuberculosis by high...
NEW
Summary: Rv0898c protein was detected in whole cell lysates but not in membrane or culture filtrate fractions by mass spectrometry, consistent with cytoplasmic localization.
Reason: Proteomics data from high-resolution mass spectrometry detected Rv0898c in whole cell lysates but not in membrane or secreted fractions, supporting cytoplasmic localization.
Supporting Evidence:
PMID:21969609
we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count

Core Functions

Rv0898c is an uncharacterized protein with no experimentally determined molecular function. The DUF2630 domain family (IPR020311) has no GO mappings and no characterized members across any organism. The protein is small (87 aa), soluble (no transmembrane domains, detected in cytoplasmic fraction), and upregulated under starvation conditions, which may suggest a role in nutrient stress adaptation. However, no specific molecular function, binding partners, or biological process involvement has been demonstrated.

Cellular Locations:
Supporting Evidence:
  • PMID:21969609
    we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count

References

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
  • Rv0898c was identified in the complete genome sequence of M. tuberculosis H37Rv
    "The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content"
Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry
  • Rv0898c protein was detected by high-resolution mass spectrometry, confirming it is expressed at the protein level
    "we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count"
Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis Genome via Saturating Transposon Mutagenesis
  • Rv0898c is non-essential for in vitro growth of M. tuberculosis based on saturating transposon mutagenesis
    "This work provides an authoritative catalog of essential regions of the M. tuberculosis genome"
Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling
  • Starvation induces a persistence-like state in M. tuberculosis with upregulation of specific gene sets
    "we have generated a model with which we can search for agents active against persistent M. tuberculosis and revealed a number of potential targets expressed under these conditions"
Mycobacterium tuberculosis Rv0899 defines a family of membrane proteins widespread in nitrogen-fixing bacteria
  • The neighboring gene Rv0899 is part of the arf operon (Rv0899-Rv0901) required for ammonia secretion, but Rv0898c is on the opposite strand and is NOT part of this operon
    "the rv0899 gene is part of an operon (rv0899-rv0901) that is required for fast ammonia secretion, pH neutralization, and growth of M"
file:MYCTU/Rv0898c/Rv0898c-notes.md
Research notes for Rv0898c
  • Rv0898c mRNA is upregulated after 96h of starvation per MycoBrowser annotation
  • Rv0898c is detected in whole cell lysates but not in culture filtrate or membrane fractions
  • No publications in PubMed specifically about Rv0898c function exist
file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
OpenScientist hypothesis run: Rv0898c DUF2630 fold discovery
  • Compute-driven AlphaFold + Foldseek analysis classified the DUF2630 fold as a two-helix antiparallel hairpin (weak uL29 ribosomal-protein similarity at 27% identity, twilight zone) but could not infer a molecular function, because the CATH 1.10.287 topology spans >600 unrelated superfamilies and the DUF2630 motif (CWDLLRQRR) has no characterized match. Supports retaining the ND annotation.
    "The current ND (no data) molecular function annotation should be retained."

Suggested Questions for Experts

Q: What is the three-dimensional structure of Rv0898c and does it reveal any structural similarity to proteins of known function?

Q: Does Rv0898c interact with any other M. tuberculosis proteins, particularly those involved in stress response or starvation survival?

Q: What is the phenotype of an Rv0898c knockout under starvation or in vivo infection conditions?

Suggested Experiments

Experiment: Determine the crystal or NMR structure of Rv0898c and perform structural similarity searches (e.g. Dali, Foldseek) to identify distant functional homologs

Experiment: Perform affinity purification coupled with mass spectrometry (AP-MS) to identify interaction partners of Rv0898c in M. tuberculosis

Experiment: Construct an Rv0898c deletion mutant and test survival under nutrient starvation, hypoxia, and in macrophage infection models

Experiment: Perform transcriptomic analysis of Rv0898c knockout versus wild-type under starvation conditions to identify downstream effects

Deep Research

Bioreason Pro

(Rv0898c-deep-research-bioreason-sft.md)
Thinking Trace Bioreason Pro SFT

Thinking Trace

I begin with the domain architecture. The protein is dominated by a single InterPro signature, IPR020311 (Uncharacterised protein family, Rv0898c, residues 886). This compact, family-specific module spans essentially the entire polypeptide, implying a small, single-domain protein without appended catalytic or transport domains. The absence of recognizable enzymatic cores (e.g., acyltransferase HxH, ligase Lys/AspPp, or Rossmann-like redox folds) and the lack of predicted transmembrane segments suggest a non-enzymatic role, most plausibly as a binding/adaptor module. Such small bacterial proteins frequently act as peripheral factors that dock onto larger machineries to modulate their assembly or activity. This architecture causally supports a molecular function centered on protein binding (GO:0005515).

From this binding-centric foundation, I infer the biological role by considering pathway context. Coenzyme A biosynthesis (GO:0015937) in bacteria proceeds through a sequence of soluble enzymes: acyl-AMP ligase (ACP/CoA ligase), phosphopantetheine kinase, dephospho-CoA kinase, and auxiliary salvage steps. A small adaptor would not catalyze these reactions but could regulate flux by stabilizing enzyme–enzyme assemblies, positioning the pathway near metabolite sources, or anchoring the soluble enzymes to the cell periphery where CoA and acyl-CoA pools are heavily utilized. The Rv0898c family’s compact, likely surface-exposed fold is well-suited to scaffold interactions rather than catalysis, which provides a mechanistic route to modulate the CoA pathway without directly performing the chemistry.

The cellular location follows from the same logic. A lack of transmembrane domains and the small size argue against an integral membrane role. However, binding to envelope components can tether soluble enzymes at the cell surface. The explicit association with a canonical mycobacterial outer envelope protein (OmpA) points to a peripheral position at the cell boundary. In mycobacteria, the peptidoglycan-based cell wall (GO:0009274) underlies the mycomembrane; a protein that bridges a porin-like envelope component to cytosolic enzymes would function at this interface. Thus, I deduce a peripheral localization at the peptidoglycan layer, where the protein can couple envelope physiology to central metabolism.

This leads to a mechanistic hypothesis. The Rv0898c fold provides a docking platform that binds both envelope proteins and metabolic enzymes. Association with OmpA would anchor the complex to the cell wall, while interaction with a probable oxidoreductase and CoA-pathway enzymes would position redox and thioester-handling reactions near the envelope, where acyl-CoA turnover is intense. By stabilizing transient multi-enzyme assemblies, the protein could enhance local CoA regeneration and channel intermediates, thereby tuning CoA biosynthesis and salvage to envelope demands. In summary, a small, single-domain binding protein localizes to the peptidoglycan interface and modulates CoA biosynthetic throughput by scaffolding partner proteins.

## Functional Summary

A small, single-domain peripheral protein in Mycobacterium tuberculosis that likely operates as a binding adaptor at the cell envelope. By docking to envelope components and soluble enzymes, it probably organizes or stabilizes multi-enzyme assemblies that tune coenzyme A production and salvage at the cell wall interface. This positioning would couple envelope metabolic demands to coenzyme pools, with interactions bridging an outer envelope protein and redox enzymes to modulate flux without direct catalysis.

## UniProt Summary

May be involved in the modulation of CoA biosynthesis.

## InterPro Domains

    - IPR020311: Uncharacterised protein family, Rv0898c (family) [8-86]

## GO Term Predictions

### Molecular Function


### Biological Process


### Cellular Component

OpenScientist

(Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md)
Final Report: Rv0898c / DUF2630 Fold-Discovery Hypothesis OpenScientist openscientist-autonomous 11 citations 4 artifacts 2026-06-21T21:09:44.149744 citations file

Final Report: Rv0898c / DUF2630 Fold-Discovery Hypothesis

Executive Judgment

Verdict: Partially Supported

The seed hypothesis proposes that Rv0898c, an 87-amino-acid uncharacterized Mycobacterium tuberculosis protein whose sole domain is DUF2630 (IPR020311), can be assigned to a known structural superfamily that implies a specific candidate molecular function using AlphaFold structure prediction and Foldseek structural-homology searches. This hypothesis is partially supported: the fold classification component is robust, but the functional inference component fails. AlphaFold predicts a high-confidence two-helix antiparallel hairpin fold (mean pLDDT 84.6, core residues >90), and Foldseek identifies structural similarity to the uL29 ribosomal protein (Prob = 0.992). However, the matched CATH topology (1.10.287, Helix Hairpins) encompasses >600 functionally diverse superfamilies, sequence identity is in the twilight zone (27%), and the DUF2630-specific conserved motif (CWDLLRQRR) has no matches in any characterized protein. Published precedent demonstrates that fold-level classification of DUFs does not reliably transfer to specific molecular function when the binding site has diverged. The current ND (no data) molecular function annotation should be retained. Fold classification is a genuine and useful structural insight, but it is insufficient to justify a GO molecular function term without experimental evidence.

Key caveat: The fold classification is genuine and useful for structural biology. The limitation is in extrapolating from fold to function, which requires either higher sequence identity, shared active-site residues with a characterized superfamily, or experimental validation.


Summary

Rv0898c (UniProt P9WKP5) is a small, single-domain protein encoded on the minus strand of the M. tuberculosis H37Rv genome, adjacent to but operonically distinct from the well-characterized ompATb operon (Rv0899–Rv0901). Its only recognized domain is DUF2630 (Pfam PF10944 / InterPro IPR020311), a family of ~1,778 exclusively bacterial proteins with zero characterized members, zero GO mappings, and zero experimental structures. The seed hypothesis posits that structural prediction and remote homology searching can assign Rv0898c to a known superfamily with functional implications.

Our investigation confirmed the structural aspect: AlphaFold model AF-P9WKP5-F1 (v6) predicts two long α-helices (α1: residues 9–30; α2: residues 36–65) in an antiparallel hairpin arrangement with high confidence. Foldseek searches against PDB100 return the uL29 ribosomal protein from M. smegmatis (PDB 6dzi, chain Z) as the top hit, placing Rv0898c in CATH topology 1.10.287 (Mainly Alpha / Orthogonal Bundle / Helix Hairpins). However, this topology is one of the most common α-helical folds in nature, populated by hundreds of functionally unrelated superfamilies. The 27% sequence identity to uL29 falls within the "twilight zone" where homology cannot be reliably inferred from structure alone, and the DUF2630-specific conserved motif (LDQCWDLLRQRRA) — which defines a charged surface patch on helix α2 — has no counterpart in uL29 or any other characterized protein family. Published methodological studies confirm that fold recognition can correctly identify structure but fail to predict function when binding sites have diverged, as exemplified by the DUF388/OB-fold case (PMID: 15178340).

Genomic context analysis revealed that Rv0898c is co-oriented with Rv0897c (a 535-aa NAD(P)-binding oxidoreductase) on the minus strand, with a STRING interaction score of 0.851 — but this score derives entirely from genome neighborhood evidence with zero experimental, co-expression, or text-mining support. The adjacent ompATb operon functions in ammonia secretion and pH adaptation, providing tantalizing but indirect contextual clues. No direct functional evidence — biochemical, genetic, or interaction-based — exists for Rv0898c in the published literature.


Key Findings

Finding 1: Rv0898c Adopts a High-Confidence Two-Helix Antiparallel Hairpin Fold

The AlphaFold structural model (AF-P9WKP5-F1, v6) of Rv0898c reveals a simple architecture: two α-helices of 22 and 30 residues, respectively, arranged in an antiparallel hairpin with an inter-helix angle of approximately 17°, connected by a short turn (residues 31–35). The N-terminus (residues 1–8) and C-terminus (residues 66–87) are predicted as disordered. Confidence metrics are strong: mean pLDDT of 84.6 overall and >90 for the structured core (residues 9–65), indicating "very high confidence" in the predicted fold.

Foldseek structural similarity searches against PDB100 returned the uL29 ribosomal protein from Mycobacterium smegmatis (PDB 6dzi, chain Z) as the top hit with Prob = 0.992, E-value = 0.33, sequence identity = 27.2%, and alignment length = 66 residues. Multiple additional hits in the CATH50 database map to topology 1.10.287 (Helix Hairpins), confirming the fold classification. Rv0898c itself has no entry in CATH v4.3.0, so this represents a new structural classification for the DUF2630 family.

{{figure:rv0898c_analysis.png|caption=Comprehensive visualization of Rv0898c structure, conservation analysis, and key motifs. The protein adopts a two-helix antiparallel hairpin fold with a conserved charged surface patch defined by the CWDLLRQRR motif.}}

Finding 2: Rv0898c Is Genomically Adjacent to but Operonically Distinct from the ompATb Operon

Rv0898c is encoded on the minus strand at position complement(1002441..1002704), while the ompATb operon genes (Rv0899/ompATb, Rv0900/arfB, Rv0901/arfC) are on the plus strand. This opposite-strand orientation rules out co-transcription. The operon structure of Rv0899–Rv0901 is experimentally confirmed: "the ompATb gene (Rv0899), encoding a major outer membrane protein, is organized in operon with Rv0900 and Rv0901, encoding two small proteins with a predicted transmembrane domain" (PMID: 21802366). Rv0898c is instead co-oriented with the upstream Rv0897c on the minus strand, suggesting possible co-transcription with this NAD(P)-binding oxidoreductase.

Finding 3: DUF2630 Remains Entirely Uncharacterized Across All Member Proteins

InterPro entry IPR020311 describes DUF2630 as "proteins with no known function," encompassing 1,778 protein members across 2,119 exclusively bacterial taxa. The family has zero GO terms mapped, zero experimental structures in the PDB, and 1,046 AlphaFold structural models. UniProt entry P9WKP5 for Rv0898c contains zero GO annotations and zero functional comments. The Pfam domain PF10944 spans residues 8–86 with an E-value of 1.7e-29, covering essentially the entire mature protein. This complete absence of functional data across all family members means there are no transfer-by-homology opportunities and no experimental anchor points for function prediction.

Finding 4: A Highly Conserved CWDLLR Motif Defines a Potential Interaction Surface

Analysis of 19 diverse DUF2630 family members across Actinobacteria reveals a core conserved motif LDQCWDLLRQRRA (Rv0898c residues 51–63) with five absolutely conserved positions (W55, D52, D56, L57, R59, R61 at 100%) and four near-invariant positions (L51, Q53, L58, A63 at 89–95%). The CWDLLR hexapeptide is unique to DUF2630 — zero hits were found in non-DUF2630 SwissProt entries.

Three-dimensional mapping of these conserved residues reveals a bipartite functional architecture:
- Structural core: Buried hydrophobic residues (L51, C54, W55, L57, L58) form the helix-helix interface. W55's indole ring contacts I16 and L13 on helix α1 (distances 3.85–4.63 Å), and R61-NH1 forms a 2.82 Å salt bridge to D9-OD1 on helix α1.
- Surface patch: Exposed charged residues (D52, D56, R59, R61, R62) define a conserved surface that could mediate protein–protein or protein–ligand interactions.

The absolute conservation of this motif across all DUF2630 members, combined with its surface exposure and charge complementarity, strongly suggests it mediates a conserved but as-yet-unknown binding interaction. Critically, this motif has no structural or sequence counterpart in uL29 or any other characterized protein, blocking functional inference from the fold match.

The adjacent ompATb operon is functionally characterized: "the proteins encoded by the ompATb operon are involved in generating a rapid ammonia burst, which neutralized medium pH and preceded exponential growth of M. tuberculosis" (PMID: 21410778). Furthermore, "Rv0899-like proteins are widespread in bacteria with functions in nitrogen metabolism, adaptation to nutrient poor environments, and/or establishing symbiosis with the host organism" (PMID: 21905117).

STRING functional enrichment groups the Rv0897c–Rv0901 cluster as "Mixed, incl. FAD/NAD(P)-binding domain superfamily" (FDR = 2.63e-11). Rv0898c is non-essential in vitro based on TnSeq data from the MtbTnDB, and DUF2630 taxonomic distribution is restricted primarily to Actinobacteria with some representatives in Nitrospira and Betaproteobacteria — lineages that include nitrogen-cycling organisms.

While this genomic context is suggestive, it provides only guilt-by-association evidence. No direct functional, biochemical, or genetic data connects Rv0898c to ammonia secretion, nitrogen metabolism, or any other biological process.

Finding 6: The Rv0897c Interaction Prediction Rests Entirely on Genome Neighborhood

STRING reports a high combined interaction score of 0.851 for Rv0897c–Rv0898c, but decomposition reveals this derives entirely from genome neighborhood evidence (0.847) with scores of zero for gene fusion, co-occurrence, experimental data, co-expression, and text mining. Rv0897c itself (UniProt P9WKP7) is a 535-aa membrane-associated protein with a validated NAD_binding_8 domain (PF13450), classified in the FAD/NAD(P)-binding domain superfamily (IPR036188). While co-transcription of Rv0897c and Rv0898c is plausible given their co-orientation on the minus strand, no experimental evidence confirms physical interaction, functional coupling, or shared pathway membership.

Finding 7: Methodological Precedent Supports Fold Classification but Cautions Against Functional Transfer

Three key methodological precedents inform the interpretation of our structural findings:

  1. ColabFold + Foldseek workflow validation (PMID: 38166563): Svedberg et al. (2024) demonstrated that this workflow "increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools" for microsporidian genomes. This validates the computational approach used here.

  2. Limits of structural classification for DUFs (PMID: 41288334): Pei et al. (2025) investigated 664 candidate novel fold domains from the TED database, creating "190 new Pfam families, many classified as domains of unknown function (DUFs)" — demonstrating that even advanced structural classification frequently fails to resolve function.

  3. DUF388/OB-fold precedent (PMID: 15178340): Ginalski et al. (2004) assigned DUF388 to the OB-fold by fold recognition but predicted the proteins would "probably lack nucleic acid-binding properties as implied by the analysis of the potential binding site." This case directly parallels Rv0898c/uL29: correct fold identification but divergent binding sites that block functional transfer.


Evidence Matrix

# Citation Evidence Type Direction Claim Tested Key Finding Context Confidence
1 AlphaFold DB (AF-P9WKP5-F1-v6) Computational (structure prediction) Supports fold classification Rv0898c has a defined structural fold Two-helix antiparallel hairpin, mean pLDDT 84.6, core >90 In silico; 87 aa protein High for fold; prediction only
2 Foldseek vs PDB100 Computational (structural homology) Supports topology assignment Fold matches a known CATH topology Top hit: uL29 (PDB 6dzi_Z), Prob=0.992, E=0.33, SeqId=27.2% M. smegmatis ribosome structure Moderate; topology yes, superfamily uncertain
3 Foldseek vs CATH50 Computational (structural classification) Supports topology, qualifies function Fold belongs to 1.10.287 (Helix Hairpins) Multiple hits in CATH 1.10.287.* (various superfamilies) Structural classification database High for topology; uninformative for function
4 UniProt P9WKP5 Database Supports ND status Current annotation is appropriate Zero GO terms, zero functional comments Swiss-Prot reviewed entry High
5 InterPro IPR020311 / Pfam PF10944 Database Supports uncharacterized status DUF2630 family characterization 1,778 members across 2,119 taxa; zero GO mappings; zero characterized members InterPro/Pfam High
6 DUF2630 MSA (19 species) Computational (conservation) Supports structural fold; neutral for function Conserved residues define a binding site LDQCWDLLRQRRA motif (5 residues 100% conserved); bipartite structural/surface architecture Cross-family; Actinobacteria-dominant Moderate; motif is real but function unclear
7 UniProt CWDLLR search Computational (motif search) Qualifies CWDLLR matches a known functional motif Zero matches in non-DUF2630 Swiss-Prot All reviewed UniProt entries High; motif is novel
8 PMID: 21802366 Direct experiment Qualifies genomic context Rv0898c is part of ompATb operon Rv0899-0901 form the ompATb operon (plus strand); Rv0898c on minus strand = NOT in operon M. bovis BCG, operon mapping High
9 PMID: 21410778 Direct experiment Contextual Adjacent operon function is known ompATb operon enables ammonia secretion for acid adaptation M. tuberculosis H37Rv High for operon; no data on Rv0898c
10 PMID: 21905117 Computational + review Contextual Genomic neighborhood functional theme Rv0899-like proteins widespread in nitrogen-fixing bacteria Bioinformatics survey Moderate
11 STRING DB v12 Computational (genomic context) Supports co-transcription with Rv0897c Rv0898c functionally linked to neighbors Rv0897c-Rv0898c: score 0.851 (neighborhood only = 0.847); experimental = 0 STRING v12 Low for physical interaction
12 TnSeq data Direct experiment (essentiality) Neutral Rv0898c is essential for growth Rv0898c is non-essential in standard in vitro growth H37Rv, 7H10 medium High for in vitro non-essentiality
13 PMID: 25467293 Computational (remote homology) Contextual M. tuberculosis proteome annotation 95% of Mtb proteins annotated; 183 mycobacteria-unique unknowns remain; Rv0898c in recalcitrant set Systematic bioinformatics Moderate
14 PMID: 38166563 Computational (method validation) Supports approach AlphaFold+Foldseek improves annotation "Increased accuracy and quality of functional genome annotation" Microsporidian genomes Moderate; validates approach
15 PMID: 41288334 Computational (structural classification) Qualifies Structural data resolves all DUFs "190 new Pfam families, many classified as DUFs"; structural data alone insufficient TED database analysis High
16 PMID: 15178340 Computational (fold recognition) Supports partial verdict Fold assignment implies function DUF388 → OB-fold but "probably lack nucleic acid-binding properties" due to diverged binding site BOF family analysis High; direct precedent

GO Curation Implications

GO Aspect Recommendation Rationale
MF (Molecular Function) Retain ND No experimental data; fold match insufficient for function transfer; unique conserved motif blocks homology-based inference
BP (Biological Process) Retain ND Genomic context suggestive but indirect; no genetic or biochemical evidence
CC (Cellular Component) Retain ND No localization data; Rv0897c is membrane-associated but no evidence extends to Rv0898c
  • GO:0003735 (structural constituent of ribosome): Despite structural similarity to uL29, the 27% sequence identity is in the twilight zone, the E-value (0.33) is marginal, M. tuberculosis has its own bona fide uL29 (rpmC, Rv0709), and no evidence links Rv0898c to ribosomes. Assigning this term would be over-annotation.

  • GO:0005515 (protein binding): While the conserved surface charges suggest protein–protein interaction capability, no binding partner has been identified. This term is uninformative without partner identification and should not be assigned based on surface charge alone.

  • GO:0016491 (oxidoreductase activity): Sometimes transferred from genomic neighbors; inappropriate without direct evidence for Rv0898c.

Potential Future GO Terms (pending experimental validation):

Scenario Candidate GO Term Evidence Type
Rv0898c shown to interact with Rv0897c GO:0005515 (protein binding) IPI
Rv0898c has regulatory/structural role for oxidoreductase Specific MF term based on demonstrated activity IDA/IMP
Expression profiling links to specific process Relevant BP term IEP
Localization demonstrated Relevant CC term IDA

Mechanistic Scope

Direct Gene-Product Activity (Unknown)

No direct molecular function has been demonstrated or reliably predicted for Rv0898c. The protein folds into a two-helix hairpin with a conserved charged surface patch (D52, D56, R59, R61, R62), which is consistent with a protein–protein interaction surface, a small-molecule binding site, or a structural/scaffolding role. However, none of these possibilities can be distinguished without experimental data.

Structural Role vs. Enzymatic Activity

The small size (87 aa), simple fold (two helices), and absence of any catalytic residue signatures argue against enzymatic activity. The protein is more likely to function as:
- A protein–protein interaction adapter or modulator
- A structural component of a macromolecular complex
- A small regulatory protein (e.g., anti-toxin, transcription co-factor)

The pairing of a small helical protein with a larger enzyme (Rv0897c, 535 aa NAD-binding oxidoreductase) on the same strand is a common genomic pattern in bacteria, where the small protein may serve as a regulatory subunit, chaperone/assembly factor, redox partner mediator, or co-factor delivery protein.

Separation from Downstream Phenotypes

The ammonia secretion / acid adaptation phenotype of the adjacent ompATb operon (Rv0899–0901) should NOT be attributed to Rv0898c. The operon is on the opposite strand and has been functionally characterized independently (PMID: 21410778, PMID: 21802366). Any functional connection to nitrogen metabolism or pH adaptation would need to be established through independent evidence, not assumed from genomic proximity. Similarly, the predicted interaction with Rv0897c (oxidoreductase) is based solely on genome neighborhood and cannot be treated as evidence for shared pathway membership.


Conflicts and Alternatives

Conflict 1: Fold vs. Function Extrapolation

The seed hypothesis assumes that structural fold classification implies a candidate molecular function. This is a common and often productive approach, but it has known limitations for simple/common folds. The helix-hairpin topology (CATH 1.10.287) is among the most promiscuous topologies in protein structure space, found in ribosomal proteins, transcription factors, membrane-associated proteins, nucleases, vesicle proteins, and ESCRT components. Fold-to-function transfer requires either: (a) superfamily-level homology (sequence identity >30% and/or shared conserved active site), or (b) shared functional motifs with characterized proteins. Neither condition is met for Rv0898c.

Conflict 2: uL29 Structural Similarity

The strongest structural hit (uL29, Prob = 0.992) could be misinterpreted as evidence for ribosomal function. Several factors argue against this:
- M. tuberculosis already encodes its own bona fide uL29 (rpmC, Rv0709, P9WHA7)
- Sequence identity (27%) is in the twilight zone
- The Foldseek E-value (0.33) is marginal for confident homology
- uL29 proteins have specific rRNA-binding features not conserved in Rv0898c
- The DUF2630-specific CWDLLR motif has no counterpart in uL29 family proteins
- No co-expression or co-occurrence with ribosomal genes

Conflict 3: STRING Score Overinterpretation

The high STRING score (0.851) for Rv0897c–Rv0898c might suggest robust functional coupling with this oxidoreductase. However, decomposition reveals 100% of the score derives from genome neighborhood (0.847) with zero from experimental interaction, co-expression, text mining, gene fusion, or co-occurrence channels. While genome neighborhood is a moderately reliable predictor in bacteria, the complete absence of orthogonal evidence types means this prediction should be treated with caution.

Alternative Hypothesis: Toxin-Antitoxin Component

Small proteins (60–120 aa) with helix-hairpin folds in M. tuberculosis are frequently components of toxin-antitoxin (TA) systems. The DUF2630 size and fold are consistent with a Type II antitoxin, and M. tuberculosis harbors an unusually large repertoire of TA systems (PMID: 30476068). However, no TA system has been identified at the Rv0898c locus, and the DUF2630 conservation pattern (restricted to Actinobacteria, not mobile-element associated) does not match typical TA distribution patterns.

Alternative Hypothesis: Accessory Subunit of Rv0897c

The most parsimonious alternative hypothesis is that Rv0898c serves as a protein–protein interaction module for Rv0897c. This is supported by: (a) likely co-transcription (same strand, adjacent), (b) STRING genome neighborhood score, (c) common small-protein/large-enzyme pairing pattern, (d) conserved surface charges. However, Rv0897c itself is uncharacterized (UniProt: "Uncharacterized protein"; only validated domain: NAD_binding_8; predicted membrane-associated with TM helices), so even confirming this interaction would not directly resolve Rv0898c's molecular function.


Knowledge Gaps

Gap What Was Checked Why It Matters Resolution
No experimental structure for any DUF2630 member PDB, AlphaFold DB, InterPro structures AlphaFold predictions are high-confidence but unvalidated X-ray/cryo-EM/NMR structure of Rv0898c or any DUF2630 member
No binding partners known STRING (computational only), BioGRID (no entries), literature Cannot assign MF without knowing what the protein binds AP-MS, bacterial two-hybrid, or crosslinking-MS in M. tuberculosis
No genetic phenotype under relevant conditions TnSeq: non-essential in vitro In vivo role may differ; conditional essentiality unknown Delete Rv0898c and test under stress, in macrophages, or in animals
No transcriptomic context for Rv0898c Literature search; no expression data found Expression conditions could reveal function RNA-seq under diverse conditions; mine MtbTnDB conditional screens
Rv0897c co-transcription unverified Inferred from strand orientation only Would strengthen or weaken functional linkage RT-PCR or RNA-seq to map transcript boundaries
CWDLLR motif function unknown Searched SwissProt, PDB, InterPro, PROSITE Most conserved and distinctive feature of DUF2630 Alanine scanning mutagenesis of D52, D56, R59, R61, R62
DUF2630 phylogenetic scope incomplete InterPro: Actinobacteria-dominant, some Nitrospira/Betaproteobacteria Narrow distribution may indicate specialized function Comprehensive phylogenomic analysis across all bacterial phyla
C54 functional role unclear 89% conserved in MSA; buried in helix interface If C54 is redox-active, it constrains function hypothesis C54A mutagenesis; test with/without Rv0897c
Rv0897c function unknown UniProt P9WKP7: "Uncharacterized"; NAD_binding_8 domain If substrate is known, Rv0898c's role might be inferred Biochemical characterization of Rv0897c; substrate identification

Discriminating Tests

Priority 1: Identify Binding Partners (High Impact, Moderate Feasibility)

  • Affinity purification coupled to mass spectrometry (AP-MS) of FLAG-tagged Rv0898c expressed in M. tuberculosis or M. smegmatis
  • Bacterial two-hybrid screen against an M. tuberculosis library
  • Crosslinking mass spectrometry (XL-MS) to capture transient interactions in vivo

These experiments would directly test whether the conserved charged surface mediates protein–protein interactions and identify potential partners, enabling GO MF annotation.

Priority 2: Mutagenesis of Conserved Surface Residues (High Impact, High Feasibility)

  • Alanine scanning of D52A, D56A, R59A, R61A, R62A in Rv0898c
  • Test mutants for: growth phenotypes under stress, interaction with Rv0897c (if confirmed), complementation of knockout
  • W55A mutation to disrupt the helix-helix interface; compare structural stability (CD spectroscopy) to wild-type

This would determine whether the conserved surface patch is functionally important and separate structural from functional roles.

Priority 3: Conditional Essentiality Screening (Moderate Impact, High Feasibility)

  • Query MtbTnDB (PMID: 40527579) for Rv0898c fitness across all ~150 conditions
  • Construct a clean deletion mutant and test under: acidic pH, nitrogen limitation, macrophage infection, oxidative stress

This would reveal whether Rv0898c's non-essentiality in standard culture masks a condition-specific role, particularly related to the neighboring ompATb operon's function in pH adaptation.

Priority 4: Co-transcription and Expression Analysis (Moderate Impact, High Feasibility)

  • RT-PCR spanning the Rv0897c-Rv0898c intergenic region to confirm/refute co-transcription
  • Mine existing M. tuberculosis RNA-seq datasets for Rv0898c expression patterns
  • Test if Rv0898c is induced under conditions that activate the ompATb operon

Priority 5: Rv0897c Enzymatic Assay ± Rv0898c (High Impact, Lower Feasibility)

  • Purify Rv0897c with and without Rv0898c
  • Measure NAD-dependent oxidoreductase activity with candidate substrates
  • Would test the accessory subunit hypothesis biochemically

Priority 6: Experimental Structure (High Impact, Lower Feasibility)

  • X-ray crystallography or NMR of Rv0898c, ideally in complex with identified binding partners
  • Updated DALI/Foldseek search against future PDB releases as more small bacterial protein structures are solved

Curation Leads (Requiring Curator Verification)

Lead 1: Retain ND for Molecular Function

  • Recommendation: The current ND annotation for GO:MF is appropriate given the evidence.
  • Confidence: High
  • Rationale: No specific molecular function can be assigned from structural fold alone when the fold is a common topology (helix-hairpin, CATH 1.10.287, >600 superfamilies). The 27% sequence identity to uL29 is in the twilight zone, and the DUF2630-specific CWDLLR motif has no characterized counterpart.

Lead 2: Consider Structural Classification Note

  • Recommendation: If the curation system supports it, note that AlphaFold predicts a confident helix-hairpin fold (CATH topology 1.10.287) with a DUF2630-specific conserved motif (CWDLLRQRR).
  • Confidence: High for structure; low for function
  • Rationale: This is useful structural information even without functional implication.

Lead 3: Flag for Re-review When Experimental Data Emerges

  • Recommendation: Add Rv0898c to a watch list for re-review if:
  • Any DUF2630 family member receives experimental characterization
  • Binding partners are identified through high-throughput interaction studies
  • Conditional essentiality data from MtbTnDB reveals a stress-specific phenotype
  • Confidence: Moderate
  • Rationale: The conserved CWDLLR surface patch strongly suggests functional importance, but the function cannot be predicted computationally.

Lead 4: Genome Neighborhood Annotation Caveat

  • Recommendation: Block automated function transfer from the ompATb operon (Rv0899–0901) to Rv0898c.
  • Confidence: High
  • Rationale: Rv0898c is on the opposite strand and not co-transcribed with the operon. The STRING score for Rv0897c–Rv0898c (0.851) rests entirely on genome neighborhood (0.847) with zero experimental support.

Candidate References to Verify:

Reference Relevant Snippet Use
PMID: 21802366 "the ompATb gene (Rv0899)...is organized in operon with Rv0900 and Rv0901" Confirms Rv0898c is NOT part of ompATb operon
PMID: 21410778 "proteins encoded by the ompATb operon are involved in generating a rapid ammonia burst" Contextual: adjacent operon function
PMID: 21905117 "Rv0899-like proteins are widespread in bacteria with functions in nitrogen metabolism" Contextual: genomic neighborhood theme
PMID: 38166563 "increased the accuracy and quality of functional genome annotation" Methodological: validates Foldseek approach
PMID: 41288334 "190 new Pfam families, many classified as domains of unknown function (DUFs)" Methodological: structural data ≠ function
PMID: 15178340 "probably lack nucleic acid-binding properties as implied by the analysis of the potential binding site" Direct precedent: fold ≠ function when binding site diverges

Evidence Base: Key Literature

Directly Relevant to Rv0898c Genomic Context

  • PMID: 21802366 — Song et al. Confirms ompATb operon structure (Rv0899–0901) and shows dependence on small membrane proteins. Establishes that Rv0898c on the opposite strand is NOT part of this operon.

  • PMID: 21410778 — Sartain et al. Demonstrates ompATb operon role in ammonia secretion and pH adaptation. Provides functional context for the genomic neighborhood but no direct evidence for Rv0898c.

  • PMID: 21905117 — Teriete et al. Shows Rv0899-like proteins are widespread in nitrogen-fixing bacteria, broadening the functional context of the genomic region.

  • PMID: 20199110 — Teriete et al. Structural characterization of Rv0899 (OmpATb) reveals mixed α/β structure with BON domain. Important for understanding the neighboring protein but provides no information about Rv0898c.

Methodological Precedents for Fold-Based Annotation

  • PMID: 38166563 — Svedberg et al. (2024). Validates ColabFold + Foldseek workflow for functional annotation improvement, supporting the computational approach used here.

  • PMID: 41288334 — Pei et al. (2025). Large-scale investigation finding that many structurally classified domains remain DUFs, directly supporting our conclusion that fold assignment alone is insufficient for functional annotation.

  • PMID: 15178340 — Ginalski et al. (2004). DUF388 assigned to OB-fold by fold recognition but predicted to lack nucleic acid-binding properties. Direct precedent for the Rv0898c/uL29 situation where fold is correctly identified but function cannot be transferred.

M. tuberculosis Proteome Annotation

  • PMID: 25467293 — Tyagi & Bhatt (2014). Remote homology detection enriched Mtb proteome annotation to 95% coverage, with 183 mycobacteria-unique unknowns remaining. Rv0898c/DUF2630 is among this recalcitrant set.

  • PMID: 22301074 — Burge et al. (2012). Describes InterPro's protocol for GO annotation of protein signatures, including the challenges of mapping GO terms to DUF families with no characterized members.

  • PMID: 40527579 — MtbTnDB: comprehensive transposon sequencing database for M. tuberculosis. Rv0898c is non-essential under standard in vitro conditions but conditional essentiality across ~150 conditions has not been systematically examined in this investigation.


Limitations

  1. No experimental validation: All structural analyses are based on AlphaFold predictions. While confidence metrics are high (pLDDT >90 for core), predicted structures can miss ligand-induced conformational changes, oligomerization states, and post-translational modifications.

  2. Foldseek statistical significance: The top hit (uL29, E-value = 0.33) is at the margin of statistical significance. While the probability score (0.992) is high, the functional diversity of the matched topology fundamentally limits interpretive power.

  3. Conservation analysis scope: Our multiple sequence alignment included 19 diverse DUF2630 members from Actinobacteria. A broader analysis including all 1,778 family members might reveal additional conserved features, subfamily structure, or outlier members with functional clues.

  4. Genome neighborhood interpretation: Co-localization does not equal co-function, especially for small intergenic proteins. The STRING score decomposition (100% from neighborhood) underscores this limitation.

  5. Literature absence: No primary literature directly studying Rv0898c was found despite comprehensive PubMed searching. All contextual information derives from studies of neighboring genes or computational surveys. This absence itself is informative — the protein has attracted no experimental attention.

  6. Negative data bias: The non-essential designation from TnSeq reflects only standard in vitro growth conditions. Many virulence-related and stress-response genes appear non-essential in standard screens but are critical during infection or under specific stresses.

  7. Single-organism focus: Our analysis focused primarily on the M. tuberculosis H37Rv context. DUF2630 members in other organisms (Nitrospira, Betaproteobacteria) might have different genomic contexts that provide additional functional clues.

Artifacts

📚 Additional Documentation

Notes

(Rv0898c-notes.md)

Rv0898c (MYCTU) Research Notes

Gene overview

Rv0898c encodes a small (87 aa, 9.9 kDa) conserved hypothetical protein in Mycobacterium tuberculosis H37Rv. It is classified as a "conserved hypothetical protein" by MycoBrowser. The protein contains a single domain of unknown function, DUF2630 (Pfam PF10944), corresponding to InterPro family IPR020311. The InterPro entry states: "This entry contains proteins with no known function" and has no GO term mappings (interpro2go is empty for IPR020311). There are ZERO curated GO annotations in GOA for this protein.

Protein characteristics

  • 87 amino acids, single DUF2630 domain spanning nearly the entire protein (residues 8-86)
  • C-terminal disordered region (residues 67-87)
  • No predicted transmembrane domains
  • No predicted signal peptide
  • No recognizable enzymatic domains
  • Detected at the protein level by mass spectrometry PMID:21969609

Genomic context

Rv0898c is located at position 1002441 bp on the minus strand. Its immediate downstream neighbor on the chromosome is Rv0899 (ompA/arfA), which is on the plus strand starting at 1002812 bp. Rv0899 encodes a membrane protein that is part of the arf (ammonia release facilitator) operon (Rv0899-Rv0900-Rv0901). Importantly, Rv0898c is transcribed in the OPPOSITE direction from Rv0899 and is NOT part of this operon.

The Rv0899 operon is well characterized: "the rv0899 gene is part of an operon (rv0899-rv0901) that is required for fast ammonia secretion, pH neutralization and growth of M. tuberculosis in acidic environments" [PMID:21905117, DOI:10.1002/prot.23151]. However, Rv0898c, being on the complementary strand, is not part of this operon and its proximity to Rv0899 does not imply functional linkage.

Sequence homologs

MycoBrowser notes that Rv0898c is "highly similar to CAC01589.1|AL391041 hypothetical protein from Streptomyces coelicolor" and shows some homology to Rv0709, another M. tuberculosis protein. The DUF2630 family (IPR020311) contains 1,938 proteins across 2,221 taxa, all of unknown function. No member of this family has been functionally characterized.

Essentiality and expression

  • Non-essential for in vitro growth per transposon mutagenesis studies (MycoBrowser annotation)
  • Detected by mass spectrometry in whole cell lysates but NOT in culture filtrate or membrane fractions (MycoBrowser)
  • mRNA "up-regulated after 96h of starvation" (MycoBrowser annotation, based on transcriptomics). This is consistent with a role in nutrient stress adaptation, which is a common theme in M. tuberculosis persistence biology PMID:11929527

What is NOT known

  • No molecular function has been established experimentally
  • No interaction partners have been identified
  • No structural data available (though AlphaFold model exists)
  • No knockout phenotype beyond non-essentiality in vitro
  • The function of the DUF2630 domain family remains entirely uncharacterized across all organisms
  • There are no publications in PubMed specifically about Rv0898c (PubMed search for "Rv0898c" in title/abstract returns zero results)

Key references

  1. PMID:9634230 - Cole et al. 1998. "Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence." Nature 393:537-544. Original genome sequence.
  2. PMID:21969609 - Kelkar et al. 2011. "Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry." Mol Cell Proteomics. Protein detection by MS.
  3. PMID:28096490 - DeJesus et al. 2017. "Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis Genome via Saturating Transposon Mutagenesis." mBio. Essentiality analysis.
  4. PMID:21905117 - Marassi 2011. "Mycobacterium tuberculosis Rv0899 defines a family of membrane proteins widespread in nitrogen-fixing bacteria." Proteins. Describes the Rv0899 operon context.
  5. PMID:11929527 - Betts et al. 2002. "Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling." Mol Microbiol. Starvation response transcriptomics.

Bioreason Sft Review

(Rv0898c-bioreason-sft-review.md)

BioReason-Pro SFT Review: Rv0898c (MYCTU)

Source: Rv0898c-deep-research-bioreason-sft.md

  • Correctness: 1/5
  • Completeness: 2/5

Functional Summary Review

The BioReason functional summary states:

A small, single-domain peripheral protein in Mycobacterium tuberculosis that likely operates as a binding adaptor at the cell envelope. By docking to envelope components and soluble enzymes, it probably organizes or stabilizes multi-enzyme assemblies that tune coenzyme A production and salvage at the cell wall interface. This positioning would couple envelope metabolic demands to coenzyme pools, with interactions bridging an outer envelope protein and redox enzymes to modulate flux without direct catalysis.

This summary is almost entirely unsupported by any evidence:

  1. "binding adaptor at the cell envelope": There is no evidence that Rv0898c is a binding adaptor or localizes to the cell envelope. Proteomics data from MycoBrowser indicates the protein was detected in whole cell lysates but NOT in membrane fractions, arguing against an envelope localization. The protein has no transmembrane domains, no signal peptide, and no predicted membrane-association motifs. BioReason correctly notes the absence of transmembrane domains but then contradicts itself by proposing envelope localization.

  2. "coenzyme A production and salvage": There is no evidence linking Rv0898c to CoA biosynthesis. The BioReason summary appears to have confabulated this entirely. The "UniProt Summary" section of the BioReason output states "May be involved in the modulation of CoA biosynthesis" but this text does not appear in the actual UniProt entry for P9WKP5, which simply says "Uncharacterized protein Rv0898c" with no functional annotation whatsoever. This appears to be a hallucinated UniProt summary that the model then used as an anchor for its reasoning.

  3. "association with OmpA": The thinking trace states "The explicit association with a canonical mycobacterial outer envelope protein (OmpA) points to a peripheral position at the cell boundary." There is no documented association between Rv0898c and OmpA (Rv0899). While Rv0898c is genomically adjacent to Rv0899, Rv0898c is on the OPPOSITE strand (hence the "c" suffix) and is NOT part of the Rv0899-Rv0900-Rv0901 operon. The Rv0899 operon is well characterized and does not include Rv0898c. This is a critical error where genomic proximity was incorrectly interpreted as functional association.

  4. "interaction with a probable oxidoreductase": No oxidoreductase interaction has been documented for Rv0898c. This claim appears to be fabricated.

  5. What BioReason gets partially right: The summary correctly identifies that Rv0898c is a small, single-domain protein without enzymatic domains, and that it lacks transmembrane segments. These are factual observations from the InterPro annotation.

Root cause of errors: BioReason appears to have constructed an elaborate narrative from three inputs: (a) the DUF2630 domain (which tells us nothing functional), (b) genomic proximity to Rv0899/OmpA (which is on the opposite strand and functionally unrelated), and (c) what appears to be a hallucinated UniProt summary about CoA biosynthesis. The model produced a confident-sounding but entirely speculative functional prediction for a genuinely uncharacterized protein. This represents a failure mode where the model cannot express "unknown" and instead fabricates plausible-sounding biology.

Comparison with interpro2go:

IPR020311 (the sole InterPro entry for Rv0898c) has NO GO term mappings in interpro2go. The InterPro entry explicitly states: "This entry contains proteins with no known function." Therefore there are no interpro2go annotations to compare against. BioReason is not recapitulating interpro2go here -- it is going far beyond the available evidence by fabricating a detailed functional narrative where interpro2go correctly assigns nothing. In this case, the conservative interpro2go approach (assigning no terms) is more accurate than BioReason's speculative narrative.

Notes on thinking trace

The thinking trace reveals the reasoning chain that led to errors:

  1. The trace starts reasonably by noting the single-domain architecture and absence of enzymatic cores, correctly concluding a non-enzymatic role is likely. However, the leap to "protein binding (GO:0005515)" is generic and unsupported.

  2. The trace then fabricates a connection to CoA biosynthesis with no stated evidence: "Coenzyme A biosynthesis (GO:0015937) in bacteria proceeds through a sequence of soluble enzymes" -- while true as a general statement, nothing connects this pathway to Rv0898c.

  3. The statement "The explicit association with a canonical mycobacterial outer envelope protein (OmpA)" misrepresents genomic adjacency as functional association. Rv0898c is on the complement strand, divergently transcribed from Rv0899, and not part of the arf operon.

  4. The BioReason model produced no GO term predictions in the "GO Term Predictions" section (all three subsections -- MF, BP, CC -- are empty), which paradoxically is more appropriate than the narrative functional summary.

  5. The model demonstrates a systematic inability to express uncertainty for genuinely uncharacterized proteins, instead generating a plausible-sounding but fabricated functional story.

📄 View Raw YAML

id: P9WKP5
gene_symbol: Rv0898c
product_type: PROTEIN
status: DRAFT
taxon:
  id: NCBITaxon:83332
  label: Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
description: >-
  A small (87 aa) conserved hypothetical protein of unknown function in
  Mycobacterium tuberculosis H37Rv. Contains a single DUF2630 domain
  (Pfam PF10944, InterPro IPR020311) spanning nearly the entire protein.
  Detected at the protein level by mass spectrometry in whole cell lysates.
  Non-essential for in vitro growth. mRNA is upregulated after 96 hours
  of nutrient starvation, suggesting a possible role in stress adaptation,
  but no molecular function has been established for Rv0898c or any member
  of the DUF2630 family.
references:
- id: PMID:9634230
  title: Deciphering the biology of Mycobacterium tuberculosis from the complete
    genome sequence
  findings:
  - statement: Rv0898c was identified in the complete genome sequence of M. 
      tuberculosis H37Rv
    supporting_text: The genome comprises 4,411,529 base pairs, contains around 
      4,000 genes, and has a very high guanine + cytosine content
- id: PMID:21969609
  title: Proteogenomic analysis of Mycobacterium tuberculosis by high resolution
    mass spectrometry
  findings:
  - statement: Rv0898c protein was detected by high-resolution mass 
      spectrometry, confirming it is expressed at the protein level
    supporting_text: we identified 3176 proteins from Mycobacterium tuberculosis
      representing ~80% of its total predicted gene count
- id: PMID:28096490
  title: Comprehensive Essentiality Analysis of the Mycobacterium tuberculosis 
    Genome via Saturating Transposon Mutagenesis
  findings:
  - statement: Rv0898c is non-essential for in vitro growth of M. tuberculosis 
      based on saturating transposon mutagenesis
    supporting_text: This work provides an authoritative catalog of essential 
      regions of the M. tuberculosis genome
- id: PMID:11929527
  title: Evaluation of a nutrient starvation model of Mycobacterium tuberculosis
    persistence by gene and protein expression profiling
  findings:
  - statement: Starvation induces a persistence-like state in M. tuberculosis 
      with upregulation of specific gene sets
    supporting_text: we have generated a model with which we can search for 
      agents active against persistent M. tuberculosis and revealed a number of 
      potential targets expressed under these conditions
- id: PMID:21905117
  title: Mycobacterium tuberculosis Rv0899 defines a family of membrane proteins
    widespread in nitrogen-fixing bacteria
  findings:
  - statement: The neighboring gene Rv0899 is part of the arf operon 
      (Rv0899-Rv0901) required for ammonia secretion, but Rv0898c is on the 
      opposite strand and is NOT part of this operon
    supporting_text: the rv0899 gene is part of an operon (rv0899-rv0901) that 
      is required for fast ammonia secretion, pH neutralization, and growth of M
- id: file:MYCTU/Rv0898c/Rv0898c-notes.md
  title: Research notes for Rv0898c
  findings:
  - statement: Rv0898c mRNA is upregulated after 96h of starvation per 
      MycoBrowser annotation
  - statement: Rv0898c is detected in whole cell lysates but not in culture 
      filtrate or membrane fractions
  - statement: No publications in PubMed specifically about Rv0898c function
      exist
- id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
  title: 'OpenScientist hypothesis run: Rv0898c DUF2630 fold discovery'
  findings:
  - statement: Compute-driven AlphaFold + Foldseek analysis classified the DUF2630
      fold as a two-helix antiparallel hairpin (weak uL29 ribosomal-protein similarity
      at 27% identity, twilight zone) but could not infer a molecular function, because
      the CATH 1.10.287 topology spans >600 unrelated superfamilies and the DUF2630
      motif (CWDLLRQRR) has no characterized match. Supports retaining the ND annotation.
    supporting_text: The current ND (no data) molecular function annotation should
      be retained.
existing_annotations:
- term:
    id: GO:0003674
    label: molecular_function
  evidence_type: ND
  review:
    summary: No molecular function has been established for Rv0898c or any 
      member of the DUF2630 family. The InterPro entry IPR020311 explicitly 
      states this entry contains proteins with no known function and has no GO 
      term mappings.
    action: NEW
    reason: The root molecular function term with ND (No biological Data) 
      evidence is used to explicitly document that no molecular function 
      annotation is supported for this protein. The BioReason SFT trace 
      speculated about protein binding and CoA biosynthesis but these claims are
      entirely unsupported. A dedicated 3-iteration OpenScientist run (AlphaFold +
      Foldseek) reached the same conclusion. Although the DUF2630 fold is classifiable
      as a two-helix antiparallel hairpin (top Foldseek hit uL29 ribosomal protein at
      only 27% identity, twilight zone; CATH topology 1.10.287, shared by >600
      functionally diverse superfamilies), the DUF2630-specific motif (CWDLLRQRR) has
      no match in any characterized protein, so no molecular function can be inferred
      and the ND annotation should be retained.
    supported_by:
    - reference_id: file:MYCTU/Rv0898c/Rv0898c-deep-research-bioreason-sft.md
      supporting_text: ''
    - reference_id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
      supporting_text: The current ND (no data) molecular function annotation should
        be retained.
    - reference_id: file:MYCTU/Rv0898c/Rv0898c-hypotheses/fold-discovery-duf2630/openscientist.md
      supporting_text: the DUF2630-specific conserved motif (CWDLLRQRR) has no matches
        in any characterized protein
- term:
    id: GO:0005737
    label: cytoplasm
  evidence_type: IDA
  original_reference_id: PMID:21969609
  review:
    summary: Rv0898c protein was detected in whole cell lysates but not in 
      membrane or culture filtrate fractions by mass spectrometry, consistent 
      with cytoplasmic localization.
    action: NEW
    reason: Proteomics data from high-resolution mass spectrometry detected 
      Rv0898c in whole cell lysates but not in membrane or secreted fractions, 
      supporting cytoplasmic localization.
    supported_by:
    - reference_id: PMID:21969609
      supporting_text: we identified 3176 proteins from Mycobacterium 
        tuberculosis representing ~80% of its total predicted gene count
core_functions:
- description: >-
    Rv0898c is an uncharacterized protein with no experimentally determined
    molecular function. The DUF2630 domain family (IPR020311) has no GO
    mappings and no characterized members across any organism. The protein
    is small (87 aa), soluble (no transmembrane domains, detected in
    cytoplasmic fraction), and upregulated under starvation conditions,
    which may suggest a role in nutrient stress adaptation. However, no
    specific molecular function, binding partners, or biological process
    involvement has been demonstrated.
  locations:
  - id: GO:0005737
    label: cytoplasm
  supported_by:
  - reference_id: PMID:21969609
    supporting_text: we identified 3176 proteins from Mycobacterium tuberculosis
      representing ~80% of its total predicted gene count
suggested_questions:
- question: What is the three-dimensional structure of Rv0898c and does it 
    reveal any structural similarity to proteins of known function?
- question: Does Rv0898c interact with any other M. tuberculosis proteins, 
    particularly those involved in stress response or starvation survival?
- question: What is the phenotype of an Rv0898c knockout under starvation or in 
    vivo infection conditions?
suggested_experiments:
- description: Determine the crystal or NMR structure of Rv0898c and perform 
    structural similarity searches (e.g. Dali, Foldseek) to identify distant 
    functional homologs
- description: Perform affinity purification coupled with mass spectrometry 
    (AP-MS) to identify interaction partners of Rv0898c in M. tuberculosis
- description: Construct an Rv0898c deletion mutant and test survival under 
    nutrient starvation, hypoxia, and in macrophage infection models
- description: Perform transcriptomic analysis of Rv0898c knockout versus 
    wild-type under starvation conditions to identify downstream effects