A0A2G9RZF1 is a gene prediction fragment from the fragmented draft genome assembly of Aquarana catesbeiana (American bullfrog). The 156-amino acid ORF (AB205_0007200) encodes a single CUB domain (residues 31-147), but scaffold KV928989 contains two additional consecutive ORFs (A0A2G9RZH1 with EGF+CUB domains, and A0A2G9RZI6 with CUB+EGF+CUB domains, the latter explicitly flagged as a fragment by UniProt) whose combined domain architecture (CUB-EGF-CUB-CUB-EGF-CUB) matches the C-terminal region of a tolloid-family metalloprotease (BMP-1 or TLL1). A 2024 chromosome-level assembly (GCF_042186555.1) with ~17,000-fold better contiguity encodes complete BMP-1 (XP_073478370.1, 1,020 aa, LG03) and TLL1 (XP_073462190.1, 1,005 aa, LG01), confirming the original draft produced fragmented gene models. The complete gene product is a tolloid-family metalloprotease with metalloendopeptidase activity. No experimental data exist for this protein (UniProt protein evidence level 4: Predicted).
Q: Which complete tolloid-family gene does this fragment correspond to -- BMP-1 (XP_073478370.1 on LG03) or TLL1 (XP_073462190.1 on LG01)? A BLAST alignment of scaffold KV928989 against the chromosome-level assembly (GCF_042186555.1) would definitively resolve this. OpenScientist analysis confirmed A0A2G9RZF1 is a gene prediction fragment, but the specific gene identity remains undetermined.
Q: When will the UniProt proteome for Aquarana catesbeiana (UP000228934) be updated from the 2017 draft assembly to the 2024 chromosome-level assembly (GCF_042186555.1)? This update would supersede the fragment entries (A0A2G9RZF1, A0A2G9RZH1, A0A2G9RZI6) with complete gene models carrying appropriate metalloendopeptidase annotations.
Q: Does Aquarana catesbeiana have a genuine PCPE-1/PCOLCE1 ortholog separate from the tolloid-family metalloproteases? A reciprocal best BLAST of human PCPE-1 (Q15113) against the chromosome-level proteome would clarify whether the CATH FunFam classification to PCPE-1 reflects any real orthology relationship or is purely fold-level similarity.
Experiment: Align scaffold KV928989 from the RCv2.1 draft assembly against the chromosome-level assembly (GCF_042186555.1) using BLAST or minimap2 to definitively map A0A2G9RZF1 to either BMP-1 (XP_073478370.1, LG03) or TLL1 (XP_073462190.1, LG01). This computational experiment would resolve the remaining ambiguity about gene identity.
Hypothesis: A0A2G9RZF1 is a fragment of a BMP-1 or TLL1 tolloid-family metalloprotease, not a standalone protein.
Type: bioinformatics
Experiment: Clone the full-length BMP-1 or TLL1 gene from Aquarana catesbeiana cDNA using the chromosome-level assembly as reference, express recombinantly, and assay metalloendopeptidase activity against procollagen or Chordin substrates. Compare with Xenopus Xolloid, the closest well-characterized amphibian ortholog.
Hypothesis: The complete tolloid-family metalloprotease corresponding to A0A2G9RZF1 has metalloendopeptidase activity and functions in procollagen processing or BMP signaling regulation.
Type: biochemistry
The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.
You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.
We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.
We are interested in where in or outside the cell the gene product carries out its function.
We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.
Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.
| Domain feature/characteristic | Description | Function | Representative examples from recent literature |
|---|---|---|---|
| Canonical CUB domain size and identity | CUB domains are compact extracellular modules of ~100β110 amino acids; the name derives from complement C1r/C1s, Uegf, and BMP1. They recur in multidomain secreted or membrane proteins rather than acting as catalytic domains themselves (gomisruth2023structuralandevolutionary pages 2-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2). | Provide modular binding surfaces that support recognition and assembly functions in extracellular biology; for A0A2G9RZF1, the UniProt annotation of a CUB-only protein most strongly supports a non-enzymatic interaction role rather than direct catalysis (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). | Reviews of astacin-associated CUB domains and synaptic CUB proteins; PCPE2 review describing CUB domains as ~110-residue extracellular interaction motifs (gomisruth2023structuralandevolutionary pages 2-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2). |
| Structural fold / extracellular recognition module | CUB, CCP/Sushi, and TSP-1 domains are described as sandwich-like folds that favor protein-protein and protein-glycan interactions; CUB domains are frequently combined with EGF-like, Sushi, or NTR domains in larger extracellular proteins (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3). | Acts as an interaction scaffold for ligand capture, receptor modulation, matrix association, or multimeric complex formation; this is the most defensible functional inference for an uncharacterized bullfrog CUB protein (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2). | CSMD1 extracellular region; PCPE2/PCPE proteins with tandem CUB domains; SCUBE proteins with EGF-like repeats plus CUB domain (akyuz2022thediverserole pages 1-3, thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5). |
| Calcium-binding capability | Some CUB domains contain conserved calcium-binding sites that stabilize structure and regulate activity; in ADGRG6/GPR126, a conserved Ca2+-binding site in the CUB domain is critical for receptor function in vivo (lin2023thebiologyof pages 4-5). | Calcium can rigidify extracellular domains and tune ligand binding or signaling output, suggesting that if A0A2G9RZF1 retains Ca2+-coordinating residues, its activity may depend on the Ca2+-rich extracellular milieu (lin2023thebiologyof pages 4-5). | Zebrafish Gpr126/Adgrg6 extracellular region; SCUBE cbEGF modules also acquire rigid conformations upon Ca2+ binding, reinforcing the general principle of calcium-stabilized extracellular recognition assemblies (lin2023thebiologyof pages 4-5). |
| Protein-protein interaction surface | Recent reviews emphasize that CUB domains typically coordinate protein-protein binding and, in some families, protein-carbohydrate interactions; they are common in extracellular and plasma membrane-associated proteins (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). | Supports binding to ligands, receptors, matrix components, or partner domains; likely the primary molecular role of A0A2G9RZF1 unless future data show it is fused to an enzyme or receptor not captured in current annotation (thomas2024pcpe2expressionof pages 1-2). | PCPE2, CSMD1, SCUBE proteins, and synaptic CUB proteins reviewed across vertebrates and invertebrates (thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3, gonzalezcalvo2022synapseformationand pages 1-2). |
| Typical localization | CUB domains are predominantly found in secreted extracellular proteins or on extracellular regions of membrane proteins; multiple sources explicitly place them in extracellular matrix or plasma membrane-associated proteins (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, akyuz2022thediverserole pages 1-3). | Indicates that A0A2G9RZF1 most likely functions outside the cell, at the cell surface, or within extracellular matrix/egg-coat-like material rather than in cytosolic metabolism (thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3). | PCPE2 in extracellular matrix; SCUBE proteins as secreted/cell-surface glycoproteins; CSMD1 as a type-I transmembrane complement-regulatory protein (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, baum2020cubandsushi pages 1-4). |
| Non-catalytic modulatory role | CUB domains are usually accessory/regulatory modules rather than catalytic centers; in multidomain proteins they position ligands, modulate proteases, or organize receptor complexes (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). | Suggests that A0A2G9RZF1, described only as a βCUB domain-containing protein,β is most plausibly a binding/adaptor or matrix-associated recognition protein rather than an enzyme with a defined substrate-reaction pair (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). | PCPE family enhances or modulates procollagen processing through CUB-mediated interactions; SCUBE proteins function as signaling modulators/coreceptors (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5). |
| Extracellular matrix organization | CUB-containing proteins can reside in extracellular matrix and help organize or anchor macromolecular assemblies; PCPE2 includes tandem CUB domains plus an NTR domain associated with ECM binding (thomas2024pcpe2expressionof pages 1-2). Amphibian egg-envelope glycoproteins are secreted and assembled into extracellular filamentous envelopes through conserved extracellular domains (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4). | Supports a plausible role in matrix assembly, stabilization, or selective binding within extracellular coats/tissues in amphibians (thomas2024pcpe2expressionof pages 1-2, hedrick2008anuranandpig pages 3-4). | PCPE2 in ECM; anuran egg-envelope proteins as extracellular structural/recognition assemblies (thomas2024pcpe2expressionof pages 1-2, hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4). |
| Developmental and morphogenetic functions | CUB proteins are repeatedly linked to embryogenesis, organogenesis, and tissue morphogenesis; SCUBE family members are conserved developmental regulators, and ADGRG6 CUB domain integrity is required for ear, heart, and Schwann-cell related development (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). | For an amphibian protein with no direct literature, developmental extracellular signaling or morphogen-regulation is a reasonable inference if expression proves tissue-specific or stage-specific (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). | ADGRG6/GPR126 developmental signaling; SCUBE developmental biology review (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). |
| Fertilization / reproductive context | In amphibians and other chordates, extracellular reproductive proteins use conserved interaction domains to mediate egg-coat assembly, sperm binding, species recognition, and fertilization-related structural changes (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4, sawada2022mechanismsofspermβegg pages 1-2). | Because frog extracellular coats are rich in secreted recognition proteins, a bullfrog CUB-domain protein could plausibly participate in reproductive extracellular matrices or gamete interactions, though this remains inferential for A0A2G9RZF1 specifically (hedrick2008anuranandpig pages 2-3, sawada2022mechanismsofspermβegg pages 1-2). | Xenopus/anuran egg-envelope glycoproteins and ascidian fertilization systems as comparative extracellular recognition models (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4, sawada2022mechanismsofspermβegg pages 1-2). |
| Complement regulation pathway | CUB domains occur in proteins structurally related to complement regulators; CSMD1 contains multiple CUB and Sushi domains and opposes complement activation in neural tissues, reducing complement deposition at synapses (baum2020cubandsushi pages 1-4, akyuz2022thediverserole pages 1-3). | Shows that CUB modules can participate in immune surveillance/regulation by controlling extracellular complement activation; this is one possible pathway class for uncharacterized vertebrate CUB proteins (baum2020cubandsushi pages 1-4). | CSMD1 in human/mouse neural tissues; complement-linked CUB/Sushi proteins reviewed in disease and neurodevelopment (baum2020cubandsushi pages 1-4, akyuz2022thediverserole pages 1-3). |
| Synaptic and neural functions | Across species, CUB domains are considered ancient synaptic building blocks and appear in proteins involved in synapse formation/function; complement-linked CUB proteins also influence pruning-related neurobiology (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4). | Suggests a potential neural extracellular recognition role for some solitary CUB proteins, especially in vertebrates, although no direct bullfrog evidence currently exists (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4). | Synapse-focused review of CUB/CCP/TSP-1 proteins; CSMD1 complement regulation at synapses (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4). |
| BMP/TGF-Ξ² pathway modulation | SCUBE proteins use C-terminal regions including the CUB domain to bind BMP ligands/receptors and promote BMP signaling; SCUBE3 loss-of-function causes defective BMP signaling in humans (lin2023thebiologyof pages 4-5). | Establishes CUB domains as extracellular pathway modulators or coreceptors in morphogen signaling, relevant when inferring possible signaling roles for A0A2G9RZF1 (lin2023thebiologyof pages 4-5). | SCUBE1/3 interactions with BMP2/BMP7 and BMP receptors; SCUBE3 developmental disorder linked to impaired BMP signaling (lin2023thebiologyof pages 4-5). |
| Hedgehog and receptor-coreceptor functions | SCUBE2 can interact with SHH/IHH and PTCH1 and enhance Hedgehog signaling in cholesterol-rich plasma membrane microdomains; CUB-containing extracellular modules help assemble signaling complexes (lin2023thebiologyof pages 4-5). | Demonstrates that CUB domains can facilitate ligand presentation or receptor engagement rather than serving as ligands themselves (lin2023thebiologyof pages 4-5). | SCUBE2 as Hedgehog signaling enhancer/coreceptor (lin2023thebiologyof pages 4-5). |
| Membrane-association versus soluble secretion | Some CUB proteins are soluble (e.g., PCPE2 in ECM), while others are membrane-tethered (e.g., CSMD1, SCUBE-associated cell-surface forms, ADGRG6 extracellular region) (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, lin2023thebiologyof pages 4-5). | If A0A2G9RZF1 sequence lacks a transmembrane helix beyond the CUB region, secretion is more likely; if a membrane anchor is later identified, cell-surface recognition/coreceptor roles become stronger candidates (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4). | PCPE2 extracellular glycoprotein; CSMD1 type-I membrane protein; SCUBE soluble and membrane-associated forms (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, lin2023thebiologyof pages 4-5). |
| Best-supported inference for A0A2G9RZF1 | Direct literature on UniProt A0A2G9RZF1 from Aquarana catesbeiana is lacking, but convergent evidence from 2020β2024 literature indicates that an isolated CUB-domain protein is most likely an extracellular recognition/adhesion/modulatory protein involved in protein-protein interactions, potentially Ca2+-dependent, with possible roles in matrix biology, development, immunity, or reproduction depending on expression context (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, hedrick2008anuranandpig pages 2-3). | Provides a restrained functional annotation hypothesis: extracellular non-enzymatic binding protein, likely participating in partner recognition or signaling/matrix modulation rather than catalysis or transport (lin2023thebiologyof pages 4-5, thomas2024pcpe2expressionof pages 1-2). | Comparative inference from ADGRG6, SCUBE, PCPE2, CSMD1, and amphibian extracellular reproductive proteins (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, hedrick2008anuranandpig pages 2-3). |
Table: This table summarizes the structural properties, localization, functions, and pathway associations of CUB domain-containing proteins most relevant for inferring the likely biology of the uncharacterized bullfrog protein A0A2G9RZF1. It is useful because no direct literature exists for the target protein, so domain-based comparative annotation is the strongest available evidence.
A0A2G9RZF1 in Aquarana catesbeiana is best annotated (as of 2024) as a secreted or membrane-associated CUB domain recognition protein. It is likely to participate non-catalytically in extracellular protein-protein or protein-matrix interactions, potentially in reproductive or developmental biology, but without evidence for a direct enzyme, transporter, or signaling ligand activity. This interpretation is fully consistent with up-to-date structural, functional, and evolutionary studies of CUB domain proteins (2020β2024).
References
(gonzalezcalvo2022synapseformationand pages 1-2): InΓ©s GonzΓ‘lez-Calvo, MΓ©lissa Cizeron, Jean-Louis Bessereau, and Fekrije Selimi. Synapse formation and function across species: ancient roles for ccp, cub, and tsp-1 structural domains. Frontiers in Neuroscience, Apr 2022. URL: https://doi.org/10.3389/fnins.2022.866444, doi:10.3389/fnins.2022.866444. This article has 9 citations and is from a peer-reviewed journal.
(thomas2024pcpe2expressionof pages 1-2): Michael J. Thomas, Hao Xu, Angela Wang, Mirza Ahmar Beg, and Mary G. Sorci-Thomas. Pcpe2: expression of multifunctional extracellular glycoprotein associated with diverse cellular functions. Journal of Lipid Research, 65:100664, Nov 2024. URL: https://doi.org/10.1016/j.jlr.2024.100664, doi:10.1016/j.jlr.2024.100664. This article has 11 citations and is from a peer-reviewed journal.
(lin2023thebiologyof pages 4-5): Yuh-Charn Lin, Binay K. Sahoo, Shiang-Shin Gau, and Ruey-Bing Yang. The biology of scube. Journal of Biomedical Science, May 2023. URL: https://doi.org/10.1186/s12929-023-00925-3, doi:10.1186/s12929-023-00925-3. This article has 48 citations and is from a domain leading peer-reviewed journal.
(akyuz2022thediverserole pages 1-3): Esra Ermis Akyuz and Sandra M. Bell. The diverse role of cub and sushi multiple domains 1 (csmd1) in human diseases. Genes, 13:2332, Dec 2022. URL: https://doi.org/10.3390/genes13122332, doi:10.3390/genes13122332. This article has 31 citations.
(baum2020cubandsushi pages 1-4): Matthew L. Baum, Daniel K. Wilton, Allie Muthukumar, Rachel G. Fox, Alanna Carey, William Crotty, Nicole Scott-Hewitt, Elizabeth Bien, David A. Sabatini, Toby Lanser, Arnaud Frouin, Frederick Gergits, Bjarte HΓ₯vik, Chrysostomi Gialeli, Eugene Nacu, Anna M. Blom, Kevin Eggan, Matthew B. Johnson, Steven A. McCarroll, and Beth Stevens. Cub and sushi multiple domains 1 (csmd1) opposes the complement cascade in neural tissues. bioRxiv, Sep 2020. URL: https://doi.org/10.1101/2020.09.11.291427, doi:10.1101/2020.09.11.291427. This article has 29 citations.
(hedrick2008anuranandpig pages 2-3): Jerry L. Hedrick. Anuran and pig egg zona pellucida glycoproteins in fertilization and early development. The International journal of developmental biology, 52 5-6:683-701, Jan 2008. URL: https://doi.org/10.1387/ijdb.082580jh, doi:10.1387/ijdb.082580jh. This article has 68 citations.
(hedrick2008anuranandpig pages 3-4): Jerry L. Hedrick. Anuran and pig egg zona pellucida glycoproteins in fertilization and early development. The International journal of developmental biology, 52 5-6:683-701, Jan 2008. URL: https://doi.org/10.1387/ijdb.082580jh, doi:10.1387/ijdb.082580jh. This article has 68 citations.
(sawada2022mechanismsofspermβegg pages 1-2): Hitoshi Sawada and Takako Saito. Mechanisms of spermβegg interactions: what ascidian fertilization research has taught us. Cells, 11:2096, Jul 2022. URL: https://doi.org/10.3390/cells11132096, doi:10.3390/cells11132096. This article has 18 citations.
(gomisruth2023structuralandevolutionary pages 2-5): F. Xavier Gomis-RΓΌth and Walter StΓΆcker. Structural and evolutionary insights into astacin metallopeptidases. Frontiers in Molecular Biosciences, Jan 2023. URL: https://doi.org/10.3389/fmolb.2022.1080836, doi:10.3389/fmolb.2022.1080836. This article has 24 citations.
Verdict: Over-annotated (Refuted)
The hypothesis that extracellular matrix structural constituent (GO:0005201) is a core function of A0A2G9RZF1 is refuted by three convergent and independent lines of evidence:
Semantic mismatch: GO:0005201 describes proteins that contribute to the structural integrity of the ECM (collagens, elastin, fibrillin) β not CUB-domain interaction modules that mediate protein-protein recognition. CUB domains are 110-residue Ξ²-sandwich folds functioning as modular binding surfaces (PMID: 21954942), fundamentally distinct from structural ECM scaffolds.
FunFam evidence chain failure: The PCPE-1 FunFam classification that motivated this annotation is doubly flawed. PCPE-1's experimentally validated function is peptidase activator activity (GO:0016504, IDA), not ECM structural support. Its single GO:0005201 annotation derives from a proteomics cataloging study (RCA evidence from PMID: 20551380), and individual CUB domains cannot perform PCPE-1's enhancing function β which requires cooperative CUB1+CUB2 binding with >1,000-fold higher affinity than single CUB domains (PMID: 19801683).
Gene prediction artifact: A0A2G9RZF1 is almost certainly a gene prediction fragment from a poorly assembled genome. Scaffold KV928989 encodes three consecutive ORFs whose combined domain architecture (CUBβEGFβCUBβCUBβEGFβCUB) matches the C-terminal region of a BMP-1/tolloid-like metalloprotease. A chromosome-level assembly published in 2024 (GCF_042186555.1) encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa) β full-length tolloid-family metalloproteases that the 156-aa fragment represents a portion of.
The molecular function of A0A2G9RZF1 should be left unassigned. No GO annotation should be applied to this fragment. If the complete gene product were annotated, its molecular function would be metalloendopeptidase activity β not ECM structural support.
A0A2G9RZF1 is a 156-amino acid unreviewed (TrEMBL) protein from Aquarana catesbeiana (American bullfrog), predicted at protein existence level 4 (predicted) from whole-genome shotgun sequencing. Its sole recognizable feature is a single CUB domain (residues 31β147). The seed hypothesis proposed that this protein functions as an extracellular matrix structural constituent (GO:0005201), based on (a) the general extracellular localization of CUB-domain proteins, and (b) FunFam classification to PCPE-1 (Procollagen C-endopeptidase enhancer 1), a protein detected in ECM proteomics datasets.
This investigation systematically evaluated three critical questions: (1) Is GO:0005201 the correct term for what CUB domains do? (2) Does the PCPE-1 FunFam classification justify GO:0005201? (3) Is A0A2G9RZF1 a real, complete gene product? The answer to all three is no. CUB domains are protein-protein interaction modules, not structural ECM components. PCPE-1 is experimentally a peptidase activator, and its single GO:0005201 annotation derives from a low-confidence computational analysis. Most importantly, A0A2G9RZF1 appears to be a gene prediction fragment from a fragmented genome assembly, representing one CUB domain from a much larger (~1,000 aa) tolloid-family metalloprotease.
These findings were confirmed across three iterations of investigation, incorporating analysis of 32 primary literature papers, genomic context examination, and cross-assembly validation. The conclusion is robust: GO:0005201 is inappropriate for this protein at every level of analysis.
GO:0005201 (extracellular matrix structural constituent) is defined as "The action of a molecule that contributes to the structural integrity of the extracellular matrix." Examination of QuickGO annotations reveals this term is overwhelmingly applied to bona fide structural ECM proteins: collagens (COL1A1, COL4A1-6, COL5A2), elastin (ELN), fibrillin, and similar molecules that physically constitute the ECM scaffold. These are large, repetitive structural proteins whose presence is necessary for the mechanical and organizational properties of the matrix.
CUB domains, by contrast, are 110-residue protein motifs that adopt a Ξ²-sandwich fold and mediate protein-protein interactions. As established in the authoritative review by Gaboriaud et al. (PMID: 21954942): CUB domains are "110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions in various extracellular proteins." They function as modular binding surfaces β recognition and interaction elements β not as structural building blocks of the ECM. Annotating a single-CUB-domain protein with GO:0005201 conflates a protein-protein interaction module with a structural matrix component, which are fundamentally different molecular functions.
The FunFam classification that linked A0A2G9RZF1 to PCPE-1 was the primary justification for considering GO:0005201. However, examining PCPE-1's actual experimental annotations reveals a critical mismatch. Human PCPE-1 (UniProt Q15113) has been experimentally characterized with the following GO molecular functions supported by IDA (Inferred from Direct Assay) evidence from PMID: 12393877:
PCPE-1's single GO:0005201 annotation is supported only by RCA (Reviewed Computational Analysis) evidence from a proteomics study of human aortic ECM (PMID: 20551380), classified by the BHF-UCL curation group. This is a computational classification based on finding the protein in an ECM proteomics dataset β it does not demonstrate that PCPE-1 structurally contributes to the ECM.
As stated by Berry et al. (PMID: 30078642): "Procollagen C-proteinase enhancer-1 (PCPE-1) is a secreted protein that specifically accelerates proteolytic release of the C-propeptides from fibrillar procollagens, a crucial step in fibril assembly." This is a regulatory/catalytic enhancement function, fundamentally different from structural ECM support. Even if A0A2G9RZF1 were a genuine PCPE-1 ortholog (which it is not), GO:0016504 (peptidase activator activity), not GO:0005201, would be the appropriate molecular function term.
Even setting aside the semantic mismatch with GO:0005201, transferring any PCPE-1 function to A0A2G9RZF1 is unjustified because a single CUB domain is insufficient. PCPE-1 is a 449-amino acid protein with three functional domains: CUB1 (37β149), CUB2 (159β273), and NTR (318β437). Both CUB domains and the NTR domain contribute to PCPE-1's biological activities.
Critical experimental evidence from Blanc et al. (PMID: 19801683) demonstrated conclusively that "only those containing both CUB1 and CUB2 were capable of enhancing BMP-1 activity and binding to a mini-procollagen substrate with nanomolar affinity. Both these properties were lost by individual CUB domains, which had dissociation constants at least three orders of magnitude higher." This >1,000-fold loss of binding affinity means that a single CUB domain fragment cannot perform any of PCPE-1's characterized functions. The FunFam classification, while computationally valid at the domain level, does not support functional transfer to a protein containing only one of the required domains.
Bourhis et al. (PMID: 17446170) further confirmed that "Procollagen C-proteinase enhancers (PCPE-1 and -2) are extracellular glycoproteins that can stimulate the C-terminal processing of fibrillar procollagens by tolloid proteinases such as bone morphogenetic protein-1. They consist of two CUB domains (CUB1 and -2) that alone account for PCPE-enhancing activity and one C-terminal NTR domain." A0A2G9RZF1 at 156 amino acids possesses none of this required architecture.
The most decisive finding emerged from examining the genomic context. Scaffold KV928989 (134 kb) from the RCv2.1 assembly encodes three consecutive open reading frames:
| ORF | UniProt ID | Length | Domains | PANTHER Classification |
|---|---|---|---|---|
| AB205_0007200 | A0A2G9RZF1 | 156 aa | CUB | OVOCHYMASE-RELATED |
| AB205_0007210 | A0A2G9RZH1 | 228 aa | EGF + CUB | METALLOENDOPEPTIDASE (SF45) |
| AB205_0007220 | A0A2G9RZI6 | 250 aa | CUB + EGF + CUB | BONE MORPHOGENETIC PROTEIN 1 (SF53), Fragment |
The combined domain modules from these three ORFs (CUB β EGF β CUB β CUB β EGF β CUB) match the C-terminal region of a tolloid-like (mTLD) protease architecture. The neighboring ORF A0A2G9RZI6 is explicitly marked as "Fragment" in UniProt and classified by PANTHER as BMP-1. The scaffold contains multiple assembly gaps. No complete (>700 aa) BMP-1/tolloid-like protein was found elsewhere in the bullfrog proteome from this assembly, consistent with the gene being split across gaps.
{{figure:evidence_summary.png|caption=Summary of evidence against GO:0005201 annotation: domain architecture comparison between A0A2G9RZF1 (156 aa, single CUB domain) and PCPE-1 (449 aa, CUB1-CUB2-NTR), scaffold fragmentation context showing three consecutive ORFs matching tolloid-like architecture, and assembly quality metrics demonstrating gene model fragmentation}}
If A0A2G9RZF1's CUB domain is part of a tolloid-like protease, its function would be substrate recognition and presentation β not ECM structural support. Lee et al. (PMID: 18664565) demonstrated in Xenopus Xolloid that "the first and second CUB domains bind Chordin and present it to the protease domain." This substrate-recognition function is the canonical role of CUB domains in the tolloid/BMP-1 protease family and is fundamentally distinct from structural ECM support.
Additional evidence from Drosophila tolloid (PMID: 25642644) showed that N-terminal CUB domains interact with Collagen IV to enhance Tolloid activity toward its substrate Sog, while C-terminal CUB domains mediate Sog interaction. This bipartite CUB domain function (ECM anchoring + substrate binding) fine-tunes protease activity but is entirely regulatory, not structural.
The definitive confirmation came from the chromosome-level assembly GCF_042186555.1 (ASM4218655v1, 2024), which encodes:
These are full-length tolloid-family metalloproteases with the expected multi-domain architecture. The original RCv2.1 assembly had a scaffold N50 of only 39,368 bp compared to 691,824,178 bp in the chromosome-level assembly β a ~17,000-fold improvement in contiguity. The three fragmented ORFs on scaffold KV928989 (156 + 228 + 250 = 634 aa combined) represent portions of one of these ~1,000 aa genes, with the N-terminal protease domain and additional domains falling in assembly gaps.
This finding is critical for curation: the UniProt proteome (UP000228934) still references the 2017 draft assembly. When updated to the chromosome-level assembly, these fragment entries should be superseded by complete gene models.
| # | Citation | Evidence Type | Direction | Claim Tested | Key Finding | Context | Confidence |
|---|---|---|---|---|---|---|---|
| 1 | PMID: 21954942 | Structural/evolutionary review | Refutes GO:0005201 | CUB domains are structural ECM constituents | CUB domains are "110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions" β not structural scaffolds | Comprehensive review of CUB domain biology | High |
| 2 | PMID: 19801683 | Direct assay (binding, activity) | Refutes PCPE-1 transfer | Single CUB domain can perform PCPE-1 function | Individual CUB domains have >1,000Γ lower affinity; both CUB1+CUB2 required | Recombinant PCPE-1 constructs, in vitro | High |
| 3 | PMID: 30078642 | Structural (crystallography) | Qualifies PCPE-1 function | PCPE-1 is an ECM structural protein | PCPE-1 "specifically accelerates proteolytic release of the C-propeptides" β regulatory, not structural | X-ray crystallography, human PCPE-1 | High |
| 4 | PMID: 17446170 | Direct assay (mutagenesis, SPR) | Refutes PCPE-1 transfer | PCPE-1 domain requirements | CUB1+CUB2+NTR architecture required for full function | Mutagenesis, SPR, activity assays | High |
| 5 | PMID: 12393877 | Direct assay (IDA) | Supports alternative MF | PCPE-1 molecular function | Collagen binding, heparin binding, peptidase activator activity (all IDA) | Human PCPE-1, in vitro | High |
| 6 | PMID: 18664565 | Direct assay (in vivo) | Supports tolloid interpretation | CUB function in tolloid proteases | CUB domains "bind Chordin and present it to the protease domain" β substrate recognition | Xenopus Xolloid, embryos | High |
| 7 | PMID: 25642644 | Direct assay (in vivo) | Qualifies CUB function | CUB domains in Drosophila tolloid | N-terminal CUBs interact with Collagen IV; C-terminal CUBs mediate substrate binding | Drosophila embryo | High |
| 8 | PMID: 10500163 | Direct assay | Qualifies PANTHER family | Ovochymase-related family | Ovochymase polyprotein has 5 CUB domains between 3 protease domains; diverse family | Xenopus laevis eggs | Moderate |
| 9 | Scaffold KV928989 analysis | Computational/genomic | Refutes gene completeness | A0A2G9RZF1 is a complete gene | Three consecutive ORFs with complementary tolloid-like domains; assembly gaps | A. catesbeiana RCv2.1 | Medium-High |
| 10 | GCF_042186555.1 (2024) | Computational/genomic | Confirms fragmentation | Complete tolloid genes exist | BMP-1 (1,020 aa) and TLL1 (1,005 aa) are full-length in chromosome-level assembly | A. catesbeiana chromosome-level | High |
| 11 | PMID: 24117177 | Direct assay (SPR) | Qualifies PCPE-1 role | PCPE-1 has additional ECM partners | 17 new binding partners; CUB1CUB2 fragment inhibits angiogenesis β regulatory, not structural | SPR imaging, in vitro | Moderate |
| 12 | PMID: 16819821 | Structural/biophysical | Supports CUB = binding | Single-CUB proteins exist | Spermadhesins (PSP-I, PSP-II) are single CUB domain proteins functioning as binders, not ECM structural | Boar seminal plasma | Moderate |
| 13 | PMID: 20551380 | Computational (proteomics) | Source of PCPE-1's GO:0005201 | PCPE-1 in ECM fraction | Detection in ECM proteomics β structural function; basis for weak RCA annotation | Human aortic tissue | Low for functional inference |
| 14 | UniProt Q15113 | Database record | Refutes GO:0005201 as PCPE-1 core | PCPE-1 GO annotations | IDA-supported: GO:0005518, GO:0016504, GO:0008201. GO:0005201 is only RCA (computational) | Human PCPE-1 | High |
GO:0005201 (extracellular matrix structural constituent) should not be applied to A0A2G9RZF1 under any evidence code. The term is semantically incorrect for a CUB-domain protein, the evidence chain through PCPE-1 FunFam classification does not support it even for PCPE-1 itself, and the protein is a gene prediction artifact.
GO Decision Table:
| GO Term | Ontology | Action | Rationale | Evidence Level |
|---|---|---|---|---|
| GO:0005201 (ECM structural constituent) | MF | Remove / Do not apply | Semantic mismatch; CUB β structural ECM; weak RCA source | Refuted at multiple levels |
| GO:0016504 (peptidase activator activity) | MF | Do not apply | Requires CUB1+CUB2+NTR architecture; protein is a fragment | Not transferable to single CUB |
| GO:0004222 (metalloendopeptidase activity) | MF | Do not apply (to fragment) | No protease domain in fragment; correct for complete gene | Correctly rejected by seed |
| GO:0005515 (protein binding) | MF | Avoid | Uninformative; discouraged by GO guidelines | Would be technically defensible but not useful |
| GO:0005576 (extracellular region) | CC | Retain with IEA | CUB domains are nearly exclusively extracellular | Moderate; domain-based inference |
| MF unassigned | MF | Recommended | Fragment with no experimental data; function genuinely unknown | β |
Even if A0A2G9RZF1 were a standalone protein (which genomic evidence refutes), the appropriate MF for a single-CUB protein would relate to protein binding in the extracellular space β never GO:0005201. The seed hypothesis itself correctly notes that "the short length (156 aa) and absence of additional functional domains preclude confident functional assignment." This assessment is confirmed and strengthened by our analysis.
The hypothesis tests whether A0A2G9RZF1 directly functions as a structural component of the extracellular matrix β physically contributing to ECM architecture by being incorporated into the matrix scaffold, analogous to collagens, proteoglycans, or elastin.
CUB domains are non-catalytic protein-protein interaction modules. In the tolloid-family protease context (the most likely identity for the complete gene), CUB domains function as:
Neither function constitutes "structural ECM support."
| Level | Activity | Applies to A0A2G9RZF1? |
|---|---|---|
| Direct MF | ECM structural support (GO:0005201) | No β CUB domains do not structurally constitute the ECM |
| Direct MF | Protein binding / substrate recognition | Plausible for the CUB domain, but fragment status precludes annotation |
| Direct MF | Metalloendopeptidase activity | No β no protease domain in this fragment |
| Indirect | Collagen fibril assembly regulation | Only via complete tolloid protease |
| Indirect | BMP signaling modulation | Only via complete tolloid protease |
| Downstream | Corneal scarring, fibrosis | Pathway-level phenotypes, not direct MF |
The CATH FunFam classification (2.60.120.290:FF:000005) groups A0A2G9RZF1 with PCPE-1. While computationally valid at the domain-fold level, this classification does not imply functional equivalence. PCPE-1 requires CUB1 + CUB2 + NTR for its characterized activities. The FunFam grouping reflects structural similarity of the CUB fold, not functional annotation transfer. CUB domains are found across functionally diverse proteins: complement factors (C1r, C1s), endocytic receptors (cubilin β 27 CUB domains; PMID: 30295181), spermadhesins, developmental proteases, neurotransmitter receptor modulators (PMID: 21093502), and innate immune receptors.
PANTHER classifies A0A2G9RZF1 under PTHR24251 (OVOCHYMASE-RELATED). Ovochymase is a Xenopus egg polyprotein with multiple CUB domains interspersed between serine protease domains (PMID: 10500163). This family-level classification is broad and includes proteins with diverse functions (metalloproteases, serine proteases, enhancers). It should not be used to infer a specific molecular function for a single-CUB fragment.
This alternative is unlikely but not absolutely excludable. Single CUB-domain proteins exist (e.g., spermadhesins PSP-I/PSP-II in pig seminal plasma; PMID: 16819821). However, arguments against this interpretation are strong:
Even if A0A2G9RZF1 were standalone, its function would be extracellular protein binding β still NOT GO:0005201.
The PTHR24251 family includes ovochymase (egg protease with 5 CUB domains), BMP-1/tolloid metalloproteases, PCPE-1/2 enhancers, and various uncharacterized CUB-containing proteins. Homology at the domain level does not predict function at the protein level. The FunFam classification likely reflects the general CUB fold similarity rather than specific PCPE-1 functional identity.
| Gap | What Was Checked | Why It Matters | What Would Resolve It |
|---|---|---|---|
| No experimental data for A0A2G9RZF1 | UniProt (PE4), QuickGO (0 annotations), PubMed (0 hits) | Cannot validate any functional annotation without direct evidence | Recombinant expression, binding assays, localization studies |
| No transcript evidence | UniProt protein existence level (PE4) | Cannot confirm that the predicted ORF is expressed | RNA-seq from A. catesbeiana tissues mapped to chromosome-level assembly |
| Exact gene identity of fragment | Scaffold context, PANTHER classifications, domain architecture comparison | Know it's part of a tolloid gene, but not which one (BMP-1 on LG03 or TLL1 on LG01) | BLAST of scaffold KV928989 against chromosome-level assembly |
| PCPE-1 ortholog status in bullfrog | Whether a genuine PCPE-1 ortholog exists separately | If no PCPE-1 ortholog exists, the FunFam classification is even more clearly misleading | Reciprocal best BLAST of human PCPE-1 against chromosome-level proteome |
| UniProt proteome update | UP000228934 still references 2017 draft assembly | Fragment entries persist until proteome is updated | NCBI/UniProt proteome refresh to GCF_042186555.1 |
Scaffold-to-chromosome alignment: BLAST or minimap2 alignment of scaffold KV928989 against GCF_042186555.1 would definitively identify which complete gene (BMP-1 or TLL1) corresponds to the A0A2G9RZF1 locus. Cost: minutes of computation.
PCPE-1 ortholog search: Reciprocal best BLAST of human PCPE-1 (Q15113) against the chromosome-level bullfrog proteome would identify whether a genuine PCPE-1 ortholog exists, independent of A0A2G9RZF1.
RNA-seq validation: Search SRA/ENA for A. catesbeiana transcriptome data and map to the chromosome-level assembly to confirm expression of the complete tolloid gene.
AlphaFold/ESMFold: Structure prediction for A0A2G9RZF1 to assess whether it folds into a stable CUB domain with intact calcium-binding site β informative about whether even the fragment is structurally viable.
BMP-1 enhancer activity assay: Test whether the isolated CUB domain can enhance BMP-1 activity on procollagens. Expected result: negative, based on PMID: 19801683.
SPR/ITC binding assays: Test binding of the isolated CUB domain to known tolloid substrates (Chordin, procollagen C-propeptide) and ECM structural proteins. Would directly distinguish substrate-recognition from structural roles.
Full-length gene cloning: Clone the complete BMP-1/TLL1 gene from A. catesbeiana cDNA and characterize its enzymatic activity.
Action: Do not annotate A0A2G9RZF1 with GO:0005201. Remove this term from consideration as a core function.
Rationale: Three independent evidence lines refute this annotation (semantic mismatch, PCPE-1 function mismatch, gene fragmentation). No experimental evidence supports it.
References to verify:
- PMID: 19801683 β "Out of all the forms tested, only those containing both CUB1 and CUB2 were capable of enhancing BMP-1 activity"
- PMID: 21954942 β "CUB domains are 110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions"
- UniProt Q15113 GO annotations β PCPE-1 IDA terms are GO:0005518, GO:0008201, GO:0016504 (not GO:0005201)
Confidence: High.
Action: Flag A0A2G9RZF1, A0A2G9RZH1, and A0A2G9RZI6 as probable fragments of a single BMP-1/tolloid-like gene. Do not annotate fragments with function-level GO terms.
Supporting evidence:
- Three consecutive ORFs on scaffold KV928989 with complementary tolloid-family domains
- Scaffold N50 of 39,368 bp in original assembly
- Chromosome-level assembly encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)
- A0A2G9RZI6 explicitly marked as "Fragment" and classified as BMP-1
Reference to verify: PMID: 18664565 β CUB domains in tolloid = substrate recognition modules, not structural ECM
Confidence: High.
Action: Confirm the seed hypothesis's recommendation against annotating metalloendopeptidase activity (GO:0004222) for this fragment. The protease domain is absent from A0A2G9RZF1.
Note: If the complete gene product is identified as BMP-1 or TLL1, GO:0004222 would be appropriate for the full-length protein β but not for this CUB-domain-only fragment.
Confidence: High.
Action: Retain GO:0005576 (extracellular region) as a Cellular Component annotation with IEA-level evidence, based on the near-universal extracellular localization of CUB-domain proteins.
Confidence: Moderate.
Action: Do not assign any specific MF term. The function is genuinely unknown for this fragment. The original seed hypothesis statement β "The precise molecular function of A0A2G9RZF1 is unknown" β is the most accurate assessment and should be retained.
Confidence: High.
Action: When the UniProt proteome is updated to the chromosome-level assembly (GCF_042186555.1), A0A2G9RZF1 should be superseded by:
- BMP-1: XP_073478370.1 (1,020 aa, 16 exons, LG03) β NCBI Gene 141133120
- TLL1: XP_073462190.1 (1,005 aa, 21 exons, LG01) β NCBI Gene 141113146
These complete gene products would carry appropriate metalloendopeptidase and substrate-binding annotations.
The following diagram summarizes the evidence architecture:
SEED HYPOTHESIS CHAIN (refuted at each step):
A0A2G9RZF1 ββFunFamββ> PCPE-1 ββRCAββ> GO:0005201
(156 aa, β β β
1 CUB) FOLD MATCH WEAK RCA SEMANTIC MISMATCH
β β β
Single CUB IDA terms GO:0005201 =
cannot do are NOT collagens,
PCPE-1 work GO:0005201 elastin
ACTUAL IDENTITY (supported):
Scaffold KV928989 (fragmented assembly, N50=39kb):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β ...gap... [CUB] ...gap... [EGF-CUB] ...gap... β
β β β β
β A0A2G9RZF1 A0A2G9RZH1 β
β β
β [CUB-EGF-CUB] ...gap... β
β β β
β A0A2G9RZI6 (PANTHER: BMP-1, "Fragment") β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β resolves to
Chromosome-level assembly (N50=692Mb):
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β BMP-1 (1020 aa): Protease-CUB-EGF-CUB-CUB-EGF β
β or β
β TLL1 (1005 aa): Protease-CUB-EGF-CUB-CUB-EGF β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Interpretation: A0A2G9RZF1 is one CUB domain from a tolloid-family metalloprotease that was incorrectly predicted as a standalone gene due to assembly fragmentation. The CUB domain's function within the complete protein would be substrate recognition β binding target proteins and presenting them to the protease domain for cleavage. This is a regulatory/binding function within a larger enzyme, not an ECM structural role.
Gaboriaud et al. (2011) β Structure and properties of the Ca(2+)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions. PMID: 21954942
Comprehensive review establishing CUB domains as protein-protein interaction modules with Ξ²-sandwich fold. Key evidence: CUB domains mediate ligand recognition across diverse biological systems (complement, development, endocytosis, reproduction) β they are interaction modules, not structural components.
Blanc et al. (2009) β Strong cooperativity and loose geometry between CUB domains are the basis for procollagen C-proteinase enhancer activity. PMID: 19801683
Demonstrated that individual CUB domains lose >1,000-fold binding affinity. Critical for ruling out functional transfer from PCPE-1 to single-CUB proteins. This is the strongest single piece of evidence against the PCPE-1-based inference chain.
Bourhis et al. (2007) β Insights into how CUB domains can exert specific functions while sharing a common fold. PMID: 17446170
Confirmed the CUB1-CUB2-NTR architecture requirement for PCPE function. Identified PCPE-specific residues in CUB1 that are necessary but insufficient without CUB2 cooperativity.
Lee et al. (2009) β Molecular determinants of Xolloid action in vivo. PMID: 18664565
Demonstrated CUB domain function in amphibian tolloid proteases: substrate recognition and presentation to the protease domain. Directly relevant as Xenopus is the closest well-characterized amphibian model.
Winstanley et al. (2015) β Synthetic enzyme-substrate tethering obviates the Tolloid-ECM interaction during Drosophila BMP gradient formation. PMID: 25642644
Showed bipartite CUB domain function in tolloid: ECM anchoring (N-terminal CUBs) and substrate binding (C-terminal CUBs). Confirms CUB domains in tolloids serve regulatory/binding, not structural, roles.
Berry et al. (2018) β Structural Basis for the Acceleration of Procollagen Processing by Procollagen C-Proteinase Enhancer-1. PMID: 30078642)
Crystal structure of PCPE-1 revealing its mechanism as a regulatory accelerator of procollagen processing β not an ECM structural protein.
Salza et al. (2014) β Extended interaction network of procollagen C-proteinase enhancer-1 in the extracellular matrix. PMID: 24117177
Identified 17 binding partners of PCPE-1, confirming its role as an interaction hub, not structural scaffold. CUB1CUB2 fragment inhibits angiogenesis β a regulatory function.
No direct experimental evidence exists for A0A2G9RZF1. All conclusions are based on computational analysis, domain architecture reasoning, genomic context, and inference from well-characterized homologs in other species.
The fragment-to-gene assignment is inferential. We have not performed the BLAST alignment of KV928989 against the chromosome-level assembly that would definitively identify which gene (BMP-1 or TLL1) the fragment belongs to.
Assembly-based reasoning has inherent uncertainty. While the evidence is strong that A0A2G9RZF1 is a fragment, we cannot exclude unusual genomic rearrangements or lineage-specific gene fission events in bullfrog, though these would be extraordinary claims requiring extraordinary evidence.
Literature is from model organisms. CUB domain function, PCPE-1 biochemistry, and tolloid protease characterization are primarily from human, mouse, Xenopus, and Drosophila. Direct evidence from Aquarana catesbeiana is absent.
The 32 papers reviewed focused on PCPE-1 and tolloid biology. We did not exhaustively survey all possible functions of isolated CUB domains in non-model amphibians, though no evidence from the reviewed literature suggested an ECM structural role for any CUB-domain protein.
Report generated by autonomous scientific discovery agent across 3 investigation iterations. 32 papers reviewed, 6 findings confirmed. All conclusions are computational and require curator verification.
id: A0A2G9RZF1
gene_symbol: A0A2G9RZF1
product_type: PROTEIN
status: DRAFT
taxon:
id: NCBITaxon:8400
label: Aquarana catesbeiana
description: >-
A0A2G9RZF1 is a gene prediction fragment from the fragmented draft genome
assembly of Aquarana catesbeiana (American bullfrog). The 156-amino acid ORF
(AB205_0007200) encodes a single CUB domain (residues 31-147), but scaffold
KV928989 contains two additional consecutive ORFs (A0A2G9RZH1 with EGF+CUB
domains, and A0A2G9RZI6 with CUB+EGF+CUB domains, the latter explicitly
flagged as a fragment by UniProt) whose combined domain architecture
(CUB-EGF-CUB-CUB-EGF-CUB) matches the C-terminal region of a tolloid-family
metalloprotease (BMP-1 or TLL1). A 2024 chromosome-level assembly
(GCF_042186555.1) with ~17,000-fold better contiguity encodes complete BMP-1
(XP_073478370.1, 1,020 aa, LG03) and TLL1 (XP_073462190.1, 1,005 aa, LG01),
confirming the original draft produced fragmented gene models. The complete
gene product is a tolloid-family metalloprotease with metalloendopeptidase
activity. No experimental data exist for this protein (UniProt protein evidence
level 4: Predicted).
references:
- id: PMID:29127278
title: "The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA"
reference_review:
relevance: LOW
correctness: VERIFIED
review_notes: >-
This is the bullfrog genome paper cited by UniProt for the nucleotide sequence.
It provides the genomic context but no functional characterization of this
specific gene.
- id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-deep-research-falcon.md
title: "Deep research report for A0A2G9RZF1"
publication_type: DEEP_RESEARCH
reference_review:
relevance: MEDIUM
correctness: UNVERIFIED
review_notes: >-
The falcon deep research correctly identifies the protein as a CUB
domain-containing protein and provides a thorough domain-based functional
inference from recent literature on CUB domain proteins. No direct literature
exists for this specific protein, so all functional inferences are based on
domain architecture and family membership comparisons.
- id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
title: "OpenScientist deep research: GO:0005201 as core function of A0A2G9RZF1"
publication_type: DEEP_RESEARCH
reference_review:
relevance: HIGH
correctness: VERIFIED
review_notes: >-
OpenScientist autonomous investigation refuted GO:0005201 assignment on three
independent grounds: (1) semantic mismatch -- GO:0005201 describes structural
ECM proteins, not CUB-domain interaction modules; (2) PCPE-1 FunFam evidence
chain failure -- PCPE-1 real function is peptidase activator activity
(GO:0016504), and single CUB domains cannot perform PCPE-1 function (requires
cooperative CUB1+CUB2 per PMID:19801683); (3) A0A2G9RZF1 is a gene prediction
fragment from a poorly assembled genome -- a 2024 chromosome-level assembly
(GCF_042186555.1) encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa).
- id: PMID:21954942
title: "Structure and properties of the Ca(2+)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions"
reference_review:
relevance: HIGH
correctness: VERIFIED
review_notes: >-
Authoritative review of CUB domain biology. Establishes CUB domains as
protein-protein interaction modules with beta-sandwich fold, not structural
ECM components. Cited by OpenScientist report.
- id: PMID:19801683
title: "Strong cooperativity and loose geometry between CUB domains are the basis for procollagen C-proteinase enhancer activity"
reference_review:
relevance: HIGH
correctness: VERIFIED
review_notes: >-
Key experimental paper demonstrating that individual CUB domains lose
>1,000-fold binding affinity compared to the CUB1+CUB2 pair. Refutes
functional transfer from PCPE-1 to single-CUB-domain proteins.
existing_annotations: []
core_functions:
- description: >-
A0A2G9RZF1 is almost certainly a gene prediction fragment from a poorly
assembled draft genome, representing one CUB domain from a much larger
(~1,000 aa) tolloid-family metalloprotease (BMP-1 or TLL1). The 156-amino
acid ORF cannot be meaningfully annotated with a specific molecular function.
GO:0005201 (extracellular matrix structural constituent) was previously
hypothesized based on CATH FunFam classification to PCPE-1, but this was
refuted by OpenScientist analysis on three grounds: (1) GO:0005201 describes
structural ECM proteins (collagens, elastin), not CUB-domain interaction
modules; (2) PCPE-1 real function is peptidase activator activity
(GO:0016504, IDA), and individual CUB domains cannot perform PCPE-1 function
(requires cooperative CUB1+CUB2 with >1,000-fold higher affinity than single
domains); (3) genomic context shows three consecutive fragmented ORFs on
scaffold KV928989 matching tolloid-like architecture, and a 2024
chromosome-level assembly (GCF_042186555.1) encodes complete BMP-1 (1,020 aa)
and TLL1 (1,005 aa). The conservative MF assignment of protein binding
reflects only that CUB domains are established protein-protein interaction
modules; the actual function of the complete gene product would be
metalloendopeptidase activity.
molecular_function:
id: GO:0003674
label: molecular_function
locations:
- id: GO:0005576
label: extracellular region
knowledge_gaps:
- gap_statement: >-
A0A2G9RZF1 is a gene prediction fragment, not a complete protein. Its true
molecular function cannot be determined from this 156-aa fragment. The
complete gene product is a tolloid-family metalloprotease (BMP-1 or TLL1)
with metalloendopeptidase activity, but which specific gene (BMP-1 on LG03
or TLL1 on LG01 of the chromosome-level assembly) corresponds to this
fragment has not been determined by sequence alignment.
boundary: >-
The fragment contains a single CUB domain (residues 31-147) that mediates
protein-protein interactions in the extracellular space. Scaffold KV928989
encodes three consecutive ORFs whose combined domains match the C-terminal
region of a tolloid-family metalloprotease. A 2024 chromosome-level assembly
(GCF_042186555.1) confirms complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)
exist in this species.
gap_kind:
- BIOLOGY
significance: >-
The fragment status means no GO molecular function annotation should be
confidently applied to this entry. When the UniProt proteome (UP000228934)
is updated from the 2017 draft to the 2024 chromosome-level assembly, this
entry should be superseded by a complete gene model.
supported_by:
- reference_id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
supporting_text: >-
[OpenScientist verdict: REFUTED. GO:0005201 is semantically inappropriate
for CUB-domain proteins, the PCPE-1 FunFam evidence chain fails (PCPE-1
function is peptidase activator activity, individual CUB domains lose
>1,000-fold affinity), and A0A2G9RZF1 is a gene prediction fragment from a
fragmented genome assembly confirmed by a 2024 chromosome-level assembly
encoding complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)]
- reference_id: PMID:19801683
supporting_text: >-
only those containing both CUB1 and CUB2 were capable of enhancing
BMP-1 activity and binding to a mini-procollagen substrate with nanomolar
affinity. Both these properties were lost by individual CUB domains, which had
dissociation constants at least three orders of magnitude higher
- reference_id: PMID:21954942
supporting_text: >-
CUB domains are 110-residue protein motifs exhibiting a Ξ²-sandwich fold and
mediating protein-protein interactions in various extracellular proteins
- reference_id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-uniprot.txt
supporting_text: >-
[UniProt record: CUB domain at residues 31-147; InterPro IPR000859 CUB_dom;
Pfam PF00431 CUB; PANTHER PTHR24251 OVOCHYMASE-RELATED; protein existence
level 4 (Predicted)]
suggested_questions:
- question: >-
Which complete tolloid-family gene does this fragment correspond to -- BMP-1
(XP_073478370.1 on LG03) or TLL1 (XP_073462190.1 on LG01)? A BLAST
alignment of scaffold KV928989 against the chromosome-level assembly
(GCF_042186555.1) would definitively resolve this. OpenScientist analysis
confirmed A0A2G9RZF1 is a gene prediction fragment, but the specific gene
identity remains undetermined.
- question: >-
When will the UniProt proteome for Aquarana catesbeiana (UP000228934) be
updated from the 2017 draft assembly to the 2024 chromosome-level assembly
(GCF_042186555.1)? This update would supersede the fragment entries
(A0A2G9RZF1, A0A2G9RZH1, A0A2G9RZI6) with complete gene models carrying
appropriate metalloendopeptidase annotations.
- question: >-
Does Aquarana catesbeiana have a genuine PCPE-1/PCOLCE1 ortholog separate
from the tolloid-family metalloproteases? A reciprocal best BLAST of human
PCPE-1 (Q15113) against the chromosome-level proteome would clarify whether
the CATH FunFam classification to PCPE-1 reflects any real orthology
relationship or is purely fold-level similarity.
suggested_experiments:
- hypothesis: >-
A0A2G9RZF1 is a fragment of a BMP-1 or TLL1 tolloid-family metalloprotease,
not a standalone protein.
description: >-
Align scaffold KV928989 from the RCv2.1 draft assembly against the
chromosome-level assembly (GCF_042186555.1) using BLAST or minimap2 to
definitively map A0A2G9RZF1 to either BMP-1 (XP_073478370.1, LG03) or
TLL1 (XP_073462190.1, LG01). This computational experiment would resolve
the remaining ambiguity about gene identity.
experiment_type: bioinformatics
- hypothesis: >-
The complete tolloid-family metalloprotease corresponding to A0A2G9RZF1 has
metalloendopeptidase activity and functions in procollagen processing or BMP
signaling regulation.
description: >-
Clone the full-length BMP-1 or TLL1 gene from Aquarana catesbeiana cDNA
using the chromosome-level assembly as reference, express recombinantly, and
assay metalloendopeptidase activity against procollagen or Chordin
substrates. Compare with Xenopus Xolloid, the closest well-characterized
amphibian ortholog.
experiment_type: biochemistry