A0A2G9RZF1

UniProt ID: A0A2G9RZF1
Organism: Aquarana catesbeiana
Review Status: DRAFT
πŸ“ Provide Detailed Feedback

Gene Description

A0A2G9RZF1 is a gene prediction fragment from the fragmented draft genome assembly of Aquarana catesbeiana (American bullfrog). The 156-amino acid ORF (AB205_0007200) encodes a single CUB domain (residues 31-147), but scaffold KV928989 contains two additional consecutive ORFs (A0A2G9RZH1 with EGF+CUB domains, and A0A2G9RZI6 with CUB+EGF+CUB domains, the latter explicitly flagged as a fragment by UniProt) whose combined domain architecture (CUB-EGF-CUB-CUB-EGF-CUB) matches the C-terminal region of a tolloid-family metalloprotease (BMP-1 or TLL1). A 2024 chromosome-level assembly (GCF_042186555.1) with ~17,000-fold better contiguity encodes complete BMP-1 (XP_073478370.1, 1,020 aa, LG03) and TLL1 (XP_073462190.1, 1,005 aa, LG01), confirming the original draft produced fragmented gene models. The complete gene product is a tolloid-family metalloprotease with metalloendopeptidase activity. No experimental data exist for this protein (UniProt protein evidence level 4: Predicted).

Core Functions

A0A2G9RZF1 is almost certainly a gene prediction fragment from a poorly assembled draft genome, representing one CUB domain from a much larger (~1,000 aa) tolloid-family metalloprotease (BMP-1 or TLL1). The 156-amino acid ORF cannot be meaningfully annotated with a specific molecular function. GO:0005201 (extracellular matrix structural constituent) was previously hypothesized based on CATH FunFam classification to PCPE-1, but this was refuted by OpenScientist analysis on three grounds: (1) GO:0005201 describes structural ECM proteins (collagens, elastin), not CUB-domain interaction modules; (2) PCPE-1 real function is peptidase activator activity (GO:0016504, IDA), and individual CUB domains cannot perform PCPE-1 function (requires cooperative CUB1+CUB2 with >1,000-fold higher affinity than single domains); (3) genomic context shows three consecutive fragmented ORFs on scaffold KV928989 matching tolloid-like architecture, and a 2024 chromosome-level assembly (GCF_042186555.1) encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa). The conservative MF assignment of protein binding reflects only that CUB domains are established protein-protein interaction modules; the actual function of the complete gene product would be metalloendopeptidase activity.

Molecular Function:
molecular_function
Cellular Locations:
Supporting Evidence:
  • file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
    [OpenScientist verdict: REFUTED. GO:0005201 is semantically inappropriate for CUB-domain proteins, the PCPE-1 FunFam evidence chain fails (PCPE-1 function is peptidase activator activity, individual CUB domains lose >1,000-fold affinity), and A0A2G9RZF1 is a gene prediction fragment from a fragmented genome assembly confirmed by a 2024 chromosome-level assembly encoding complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)]
  • PMID:19801683
    only those containing both CUB1 and CUB2 were capable of enhancing BMP-1 activity and binding to a mini-procollagen substrate with nanomolar affinity. Both these properties were lost by individual CUB domains, which had dissociation constants at least three orders of magnitude higher
  • PMID:21954942
    CUB domains are 110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions in various extracellular proteins
  • file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-uniprot.txt
    [UniProt record: CUB domain at residues 31-147; InterPro IPR000859 CUB_dom; Pfam PF00431 CUB; PANTHER PTHR24251 OVOCHYMASE-RELATED; protein existence level 4 (Predicted)]

References

The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA
file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-deep-research-falcon.md
Deep research report for A0A2G9RZF1
file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
OpenScientist deep research: GO:0005201 as core function of A0A2G9RZF1
Structure and properties of the Ca(2+)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions
Strong cooperativity and loose geometry between CUB domains are the basis for procollagen C-proteinase enhancer activity

Suggested Questions for Experts

Q: Which complete tolloid-family gene does this fragment correspond to -- BMP-1 (XP_073478370.1 on LG03) or TLL1 (XP_073462190.1 on LG01)? A BLAST alignment of scaffold KV928989 against the chromosome-level assembly (GCF_042186555.1) would definitively resolve this. OpenScientist analysis confirmed A0A2G9RZF1 is a gene prediction fragment, but the specific gene identity remains undetermined.

Q: When will the UniProt proteome for Aquarana catesbeiana (UP000228934) be updated from the 2017 draft assembly to the 2024 chromosome-level assembly (GCF_042186555.1)? This update would supersede the fragment entries (A0A2G9RZF1, A0A2G9RZH1, A0A2G9RZI6) with complete gene models carrying appropriate metalloendopeptidase annotations.

Q: Does Aquarana catesbeiana have a genuine PCPE-1/PCOLCE1 ortholog separate from the tolloid-family metalloproteases? A reciprocal best BLAST of human PCPE-1 (Q15113) against the chromosome-level proteome would clarify whether the CATH FunFam classification to PCPE-1 reflects any real orthology relationship or is purely fold-level similarity.

Suggested Experiments

Experiment: Align scaffold KV928989 from the RCv2.1 draft assembly against the chromosome-level assembly (GCF_042186555.1) using BLAST or minimap2 to definitively map A0A2G9RZF1 to either BMP-1 (XP_073478370.1, LG03) or TLL1 (XP_073462190.1, LG01). This computational experiment would resolve the remaining ambiguity about gene identity.

Hypothesis: A0A2G9RZF1 is a fragment of a BMP-1 or TLL1 tolloid-family metalloprotease, not a standalone protein.

Type: bioinformatics

Experiment: Clone the full-length BMP-1 or TLL1 gene from Aquarana catesbeiana cDNA using the chromosome-level assembly as reference, express recombinantly, and assay metalloendopeptidase activity against procollagen or Chordin substrates. Compare with Xenopus Xolloid, the closest well-characterized amphibian ortholog.

Hypothesis: The complete tolloid-family metalloprotease corresponding to A0A2G9RZF1 has metalloendopeptidase activity and functions in procollagen processing or BMP signaling regulation.

Type: biochemistry

Deep Research

Falcon

(A0A2G9RZF1-deep-research-falcon.md)
Comprehensive Research Report: Functional Annotation of UniProt A0A2G9RZF1 (CUB Domain-Containing Protein) in Aquarana catesbeiana (American bullfrog) Falcon Edison Scientific Literature 29 citations 1 artifacts 2026-06-18T19:53:14.392287

The research report should be a detailed narrative explaining the function, biological processes, and localization of the gene product. Citations should be given for all claims.

You should prioritize authoritative reviews and primary scientific literature when conducting research. You can supplement
this with annotations you find in gene/protein databases, but these can be outdated or inaccurate.

We are specifically interested in the primary function of the gene - for enzymes, what reaction is catalyzed, and what is the substrate specificity? For transporters, what is the substrate? For structural proteins or adapters, what is the broader structural role? For signaling molecules, what is the role in the pathway.

We are interested in where in or outside the cell the gene product carries out its function.

We are also interested in the signaling or biochemical pathways in which the gene functions. We are less interested in broad pleiotropic effects, except where these elucidate the precise role.

Include evidence where possible. We are interested in both experimental evidence as well as inference from structure, evolution, or bioinformatic analysis. Precise studies should be prioritized over high-throughput, where available.

Comprehensive Research Report: Functional Annotation of UniProt A0A2G9RZF1 (CUB Domain-Containing Protein) in Aquarana catesbeiana (American bullfrog)

1. Gene/Protein Identification and Verification

  • UniProt Accession: A0A2G9RZF1
  • Description: CUB domain-containing protein; ORFName=AB205_0007200; organism verified as Aquarana catesbeiana (American bullfrog).
  • Domains/Families: Annotated to contain a CUB domain (IPR000859), Sperma_CUB_dom_sf (IPR035914), and CUB (PF00431). No evidence was found for ambiguity or gene/protein misidentification; there is no characterized paralog with this identifier in other species. No direct literature discusses this specific protein in bullfrogβ€”this report is based on domain-based inference using highly recent and authoritative literature as required.

2. CUB Domain: Key Concepts and Definitions

  • Structure: CUB domains are ~100–110 residue extracellular modules named after complement subcomponents C1r/C1s, Uegf, and BMP1. They adopt a sandwich-like fold ideal for protein-protein or protein-glycan interactions (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2).
  • Localization: CUB domains are almost exclusively found in secreted, extracellular, or plasma membrane-associated proteinsβ€”never as cytosolic enzymes or transporters (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5).
  • Function: They serve as modular binding surfaces, supporting the recognition and assembly of macromolecular complexes for extracellular biology. Roles include ligand or matrix interaction, signaling modulation, and structural organization (thomas2024pcpe2expressionof pages 1-2).
  • Calcium Binding: Several CUB domains include a conserved calcium ion binding site. Calcium stabilizes their structure and can be critical for receptor or co-receptor function (lin2023thebiologyof pages 4-5).
  • Non-catalytic Nature: CUB domains are not catalytic domains; where found alone, they most plausibly act as recognition or adaptation modules (thomas2024pcpe2expressionof pages 1-2).

3. Recent Developments & Latest Research (2023–2024)

  • Comparative studies show broad functional conservation of CUB domains across vertebrates, including amphibians. Tandem CUB domains are reviewed in PCPE2, SCUBE, and CSMD1 families as mediators of matrix organization, morphogen signaling, complement regulation, and reproductive extracellular interactions (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, akyuz2022thediverserole pages 1-3).
  • A conserved calcium-binding CUB domain in the aGPCR Gpr126/ADGRG6 is required for neurodevelopmental and morphogenetic functions (lin2023thebiologyof pages 4-5).
  • SCUBE proteins, with terminal CUB domains, function as co-receptors for BMP signaling and modulate developmental and disease pathways (lin2023thebiologyof pages 4-5).
  • Complement-regulatory CUB domain proteins such as CSMD1 in neural tissues limit complement deposition at synapses, showing roles beyond immunity (baum2020cubandsushi pages 1-4).

4. Current Applications and Real-World Implementations

  • Matrix Assembly & Fertilization: In Xenopus laevis and other amphibians, CUB (and other similar) domains are present in egg envelope glycoproteins, involved in sperm-egg recognition and the structural organization of the fertilization coat (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4, sawada2022mechanismsofsperm–egg pages 1-2).
  • Signaling Modulators: SCUBE family proteins use their CUB domains in BMP and Hedgehog pathway modulation, functioning in bone morphogenesis, signaling, and paracrine ligand presentation (lin2023thebiologyof pages 4-5).
  • Complement Regulation: CSMD1 and related proteins utilize CUB domains to regulate complement activation, contributing to neurodevelopment and protection against excessive immune activation (baum2020cubandsushi pages 1-4, akyuz2022thediverserole pages 1-3).

5. Biological Processes and Localization

  • Localization: CUB domain proteins are either secreted into the extracellular space or presented on the outer face of the plasma membrane (thomas2024pcpe2expressionof pages 1-2).
  • Processes:
  • Extracellular matrix organization and egg envelope assembly.
  • Sperm-egg recognition and fertilization.
  • Neural circuit development and synaptic pruning (through complement modulation).
  • Enhancement of ligand-receptor interactions in cell signaling, especially in morphogen gradients during development (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2).

6. Pathways and Partner Interactions

  • Signaling Pathways:
  • BMP/TGF-Ξ² pathway: SCUBE proteins with CUB domains act as BMP co-receptors.
  • Hedgehog pathway: CUB-containing SCUBE2 enhances ligand presentation.
  • Complement cascade: CSMD1 and related proteins use CUB domains to regulate C3 activation and deposition, dampening immune signaling at sensitive sites (baum2020cubandsushi pages 1-4).

7. Comparative & Expert Analysis

  • For A0A2G9RZF1: As a CUB-only or CUB-dominated protein without additional defined catalytic or transmembrane regions, A0A2G9RZF1 is most likely a secreted or cell-surface-anchored recognition molecule. Its function is likely to involve recognition or scaffolding roles in the extracellular matrix, fertilization envelope, tissue morphogenesis, matrix remodeling, or extracellular immune modulation. Without evidence of an enzyme domain, a canonical transport or catalytic role is highly unlikely (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2).
Domain feature/characteristic Description Function Representative examples from recent literature
Canonical CUB domain size and identity CUB domains are compact extracellular modules of ~100–110 amino acids; the name derives from complement C1r/C1s, Uegf, and BMP1. They recur in multidomain secreted or membrane proteins rather than acting as catalytic domains themselves (gomisruth2023structuralandevolutionary pages 2-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2). Provide modular binding surfaces that support recognition and assembly functions in extracellular biology; for A0A2G9RZF1, the UniProt annotation of a CUB-only protein most strongly supports a non-enzymatic interaction role rather than direct catalysis (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). Reviews of astacin-associated CUB domains and synaptic CUB proteins; PCPE2 review describing CUB domains as ~110-residue extracellular interaction motifs (gomisruth2023structuralandevolutionary pages 2-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2).
Structural fold / extracellular recognition module CUB, CCP/Sushi, and TSP-1 domains are described as sandwich-like folds that favor protein-protein and protein-glycan interactions; CUB domains are frequently combined with EGF-like, Sushi, or NTR domains in larger extracellular proteins (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3). Acts as an interaction scaffold for ligand capture, receptor modulation, matrix association, or multimeric complex formation; this is the most defensible functional inference for an uncharacterized bullfrog CUB protein (gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2). CSMD1 extracellular region; PCPE2/PCPE proteins with tandem CUB domains; SCUBE proteins with EGF-like repeats plus CUB domain (akyuz2022thediverserole pages 1-3, thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5).
Calcium-binding capability Some CUB domains contain conserved calcium-binding sites that stabilize structure and regulate activity; in ADGRG6/GPR126, a conserved Ca2+-binding site in the CUB domain is critical for receptor function in vivo (lin2023thebiologyof pages 4-5). Calcium can rigidify extracellular domains and tune ligand binding or signaling output, suggesting that if A0A2G9RZF1 retains Ca2+-coordinating residues, its activity may depend on the Ca2+-rich extracellular milieu (lin2023thebiologyof pages 4-5). Zebrafish Gpr126/Adgrg6 extracellular region; SCUBE cbEGF modules also acquire rigid conformations upon Ca2+ binding, reinforcing the general principle of calcium-stabilized extracellular recognition assemblies (lin2023thebiologyof pages 4-5).
Protein-protein interaction surface Recent reviews emphasize that CUB domains typically coordinate protein-protein binding and, in some families, protein-carbohydrate interactions; they are common in extracellular and plasma membrane-associated proteins (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). Supports binding to ligands, receptors, matrix components, or partner domains; likely the primary molecular role of A0A2G9RZF1 unless future data show it is fused to an enzyme or receptor not captured in current annotation (thomas2024pcpe2expressionof pages 1-2). PCPE2, CSMD1, SCUBE proteins, and synaptic CUB proteins reviewed across vertebrates and invertebrates (thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3, gonzalezcalvo2022synapseformationand pages 1-2).
Typical localization CUB domains are predominantly found in secreted extracellular proteins or on extracellular regions of membrane proteins; multiple sources explicitly place them in extracellular matrix or plasma membrane-associated proteins (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, akyuz2022thediverserole pages 1-3). Indicates that A0A2G9RZF1 most likely functions outside the cell, at the cell surface, or within extracellular matrix/egg-coat-like material rather than in cytosolic metabolism (thomas2024pcpe2expressionof pages 1-2, akyuz2022thediverserole pages 1-3). PCPE2 in extracellular matrix; SCUBE proteins as secreted/cell-surface glycoproteins; CSMD1 as a type-I transmembrane complement-regulatory protein (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5, baum2020cubandsushi pages 1-4).
Non-catalytic modulatory role CUB domains are usually accessory/regulatory modules rather than catalytic centers; in multidomain proteins they position ligands, modulate proteases, or organize receptor complexes (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). Suggests that A0A2G9RZF1, described only as a β€œCUB domain-containing protein,” is most plausibly a binding/adaptor or matrix-associated recognition protein rather than an enzyme with a defined substrate-reaction pair (gomisruth2023structuralandevolutionary pages 2-5, thomas2024pcpe2expressionof pages 1-2). PCPE family enhances or modulates procollagen processing through CUB-mediated interactions; SCUBE proteins function as signaling modulators/coreceptors (thomas2024pcpe2expressionof pages 1-2, lin2023thebiologyof pages 4-5).
Extracellular matrix organization CUB-containing proteins can reside in extracellular matrix and help organize or anchor macromolecular assemblies; PCPE2 includes tandem CUB domains plus an NTR domain associated with ECM binding (thomas2024pcpe2expressionof pages 1-2). Amphibian egg-envelope glycoproteins are secreted and assembled into extracellular filamentous envelopes through conserved extracellular domains (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4). Supports a plausible role in matrix assembly, stabilization, or selective binding within extracellular coats/tissues in amphibians (thomas2024pcpe2expressionof pages 1-2, hedrick2008anuranandpig pages 3-4). PCPE2 in ECM; anuran egg-envelope proteins as extracellular structural/recognition assemblies (thomas2024pcpe2expressionof pages 1-2, hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4).
Developmental and morphogenetic functions CUB proteins are repeatedly linked to embryogenesis, organogenesis, and tissue morphogenesis; SCUBE family members are conserved developmental regulators, and ADGRG6 CUB domain integrity is required for ear, heart, and Schwann-cell related development (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). For an amphibian protein with no direct literature, developmental extracellular signaling or morphogen-regulation is a reasonable inference if expression proves tissue-specific or stage-specific (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2). ADGRG6/GPR126 developmental signaling; SCUBE developmental biology review (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2).
Fertilization / reproductive context In amphibians and other chordates, extracellular reproductive proteins use conserved interaction domains to mediate egg-coat assembly, sperm binding, species recognition, and fertilization-related structural changes (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4, sawada2022mechanismsofsperm–egg pages 1-2). Because frog extracellular coats are rich in secreted recognition proteins, a bullfrog CUB-domain protein could plausibly participate in reproductive extracellular matrices or gamete interactions, though this remains inferential for A0A2G9RZF1 specifically (hedrick2008anuranandpig pages 2-3, sawada2022mechanismsofsperm–egg pages 1-2). Xenopus/anuran egg-envelope glycoproteins and ascidian fertilization systems as comparative extracellular recognition models (hedrick2008anuranandpig pages 2-3, hedrick2008anuranandpig pages 3-4, sawada2022mechanismsofsperm–egg pages 1-2).
Complement regulation pathway CUB domains occur in proteins structurally related to complement regulators; CSMD1 contains multiple CUB and Sushi domains and opposes complement activation in neural tissues, reducing complement deposition at synapses (baum2020cubandsushi pages 1-4, akyuz2022thediverserole pages 1-3). Shows that CUB modules can participate in immune surveillance/regulation by controlling extracellular complement activation; this is one possible pathway class for uncharacterized vertebrate CUB proteins (baum2020cubandsushi pages 1-4). CSMD1 in human/mouse neural tissues; complement-linked CUB/Sushi proteins reviewed in disease and neurodevelopment (baum2020cubandsushi pages 1-4, akyuz2022thediverserole pages 1-3).
Synaptic and neural functions Across species, CUB domains are considered ancient synaptic building blocks and appear in proteins involved in synapse formation/function; complement-linked CUB proteins also influence pruning-related neurobiology (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4). Suggests a potential neural extracellular recognition role for some solitary CUB proteins, especially in vertebrates, although no direct bullfrog evidence currently exists (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4). Synapse-focused review of CUB/CCP/TSP-1 proteins; CSMD1 complement regulation at synapses (gonzalezcalvo2022synapseformationand pages 1-2, baum2020cubandsushi pages 1-4).
BMP/TGF-Ξ² pathway modulation SCUBE proteins use C-terminal regions including the CUB domain to bind BMP ligands/receptors and promote BMP signaling; SCUBE3 loss-of-function causes defective BMP signaling in humans (lin2023thebiologyof pages 4-5). Establishes CUB domains as extracellular pathway modulators or coreceptors in morphogen signaling, relevant when inferring possible signaling roles for A0A2G9RZF1 (lin2023thebiologyof pages 4-5). SCUBE1/3 interactions with BMP2/BMP7 and BMP receptors; SCUBE3 developmental disorder linked to impaired BMP signaling (lin2023thebiologyof pages 4-5).
Hedgehog and receptor-coreceptor functions SCUBE2 can interact with SHH/IHH and PTCH1 and enhance Hedgehog signaling in cholesterol-rich plasma membrane microdomains; CUB-containing extracellular modules help assemble signaling complexes (lin2023thebiologyof pages 4-5). Demonstrates that CUB domains can facilitate ligand presentation or receptor engagement rather than serving as ligands themselves (lin2023thebiologyof pages 4-5). SCUBE2 as Hedgehog signaling enhancer/coreceptor (lin2023thebiologyof pages 4-5).
Membrane-association versus soluble secretion Some CUB proteins are soluble (e.g., PCPE2 in ECM), while others are membrane-tethered (e.g., CSMD1, SCUBE-associated cell-surface forms, ADGRG6 extracellular region) (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, lin2023thebiologyof pages 4-5). If A0A2G9RZF1 sequence lacks a transmembrane helix beyond the CUB region, secretion is more likely; if a membrane anchor is later identified, cell-surface recognition/coreceptor roles become stronger candidates (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4). PCPE2 extracellular glycoprotein; CSMD1 type-I membrane protein; SCUBE soluble and membrane-associated forms (thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, lin2023thebiologyof pages 4-5).
Best-supported inference for A0A2G9RZF1 Direct literature on UniProt A0A2G9RZF1 from Aquarana catesbeiana is lacking, but convergent evidence from 2020–2024 literature indicates that an isolated CUB-domain protein is most likely an extracellular recognition/adhesion/modulatory protein involved in protein-protein interactions, potentially Ca2+-dependent, with possible roles in matrix biology, development, immunity, or reproduction depending on expression context (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, hedrick2008anuranandpig pages 2-3). Provides a restrained functional annotation hypothesis: extracellular non-enzymatic binding protein, likely participating in partner recognition or signaling/matrix modulation rather than catalysis or transport (lin2023thebiologyof pages 4-5, thomas2024pcpe2expressionof pages 1-2). Comparative inference from ADGRG6, SCUBE, PCPE2, CSMD1, and amphibian extracellular reproductive proteins (lin2023thebiologyof pages 4-5, gonzalezcalvo2022synapseformationand pages 1-2, thomas2024pcpe2expressionof pages 1-2, baum2020cubandsushi pages 1-4, hedrick2008anuranandpig pages 2-3).

Table: This table summarizes the structural properties, localization, functions, and pathway associations of CUB domain-containing proteins most relevant for inferring the likely biology of the uncharacterized bullfrog protein A0A2G9RZF1. It is useful because no direct literature exists for the target protein, so domain-based comparative annotation is the strongest available evidence.

8. Data, Statistics, and Source Annotation

  • See referenced artifact for integrated summary table of domain characteristics, biological processes, and pathway associations, with representative examples from 2020–2024 literature.
  • Most claims above are directly supported by data in reviews and experimental studies of CUB domain-containing proteins in vertebrates and comparative domain annotation of amphibian extracellular proteins (thomas2024pcpe2expressionof pages 1-2, gonzalezcalvo2022synapseformationand pages 1-2, hedrick2008anuranandpig pages 2-3).

9. Authoritative Sources and URLs

10. Summary Conclusion

A0A2G9RZF1 in Aquarana catesbeiana is best annotated (as of 2024) as a secreted or membrane-associated CUB domain recognition protein. It is likely to participate non-catalytically in extracellular protein-protein or protein-matrix interactions, potentially in reproductive or developmental biology, but without evidence for a direct enzyme, transporter, or signaling ligand activity. This interpretation is fully consistent with up-to-date structural, functional, and evolutionary studies of CUB domain proteins (2020–2024).

References

  1. (gonzalezcalvo2022synapseformationand pages 1-2): InΓ©s GonzΓ‘lez-Calvo, MΓ©lissa Cizeron, Jean-Louis Bessereau, and Fekrije Selimi. Synapse formation and function across species: ancient roles for ccp, cub, and tsp-1 structural domains. Frontiers in Neuroscience, Apr 2022. URL: https://doi.org/10.3389/fnins.2022.866444, doi:10.3389/fnins.2022.866444. This article has 9 citations and is from a peer-reviewed journal.

  2. (thomas2024pcpe2expressionof pages 1-2): Michael J. Thomas, Hao Xu, Angela Wang, Mirza Ahmar Beg, and Mary G. Sorci-Thomas. Pcpe2: expression of multifunctional extracellular glycoprotein associated with diverse cellular functions. Journal of Lipid Research, 65:100664, Nov 2024. URL: https://doi.org/10.1016/j.jlr.2024.100664, doi:10.1016/j.jlr.2024.100664. This article has 11 citations and is from a peer-reviewed journal.

  3. (lin2023thebiologyof pages 4-5): Yuh-Charn Lin, Binay K. Sahoo, Shiang-Shin Gau, and Ruey-Bing Yang. The biology of scube. Journal of Biomedical Science, May 2023. URL: https://doi.org/10.1186/s12929-023-00925-3, doi:10.1186/s12929-023-00925-3. This article has 48 citations and is from a domain leading peer-reviewed journal.

  4. (akyuz2022thediverserole pages 1-3): Esra Ermis Akyuz and Sandra M. Bell. The diverse role of cub and sushi multiple domains 1 (csmd1) in human diseases. Genes, 13:2332, Dec 2022. URL: https://doi.org/10.3390/genes13122332, doi:10.3390/genes13122332. This article has 31 citations.

  5. (baum2020cubandsushi pages 1-4): Matthew L. Baum, Daniel K. Wilton, Allie Muthukumar, Rachel G. Fox, Alanna Carey, William Crotty, Nicole Scott-Hewitt, Elizabeth Bien, David A. Sabatini, Toby Lanser, Arnaud Frouin, Frederick Gergits, Bjarte HΓ₯vik, Chrysostomi Gialeli, Eugene Nacu, Anna M. Blom, Kevin Eggan, Matthew B. Johnson, Steven A. McCarroll, and Beth Stevens. Cub and sushi multiple domains 1 (csmd1) opposes the complement cascade in neural tissues. bioRxiv, Sep 2020. URL: https://doi.org/10.1101/2020.09.11.291427, doi:10.1101/2020.09.11.291427. This article has 29 citations.

  6. (hedrick2008anuranandpig pages 2-3): Jerry L. Hedrick. Anuran and pig egg zona pellucida glycoproteins in fertilization and early development. The International journal of developmental biology, 52 5-6:683-701, Jan 2008. URL: https://doi.org/10.1387/ijdb.082580jh, doi:10.1387/ijdb.082580jh. This article has 68 citations.

  7. (hedrick2008anuranandpig pages 3-4): Jerry L. Hedrick. Anuran and pig egg zona pellucida glycoproteins in fertilization and early development. The International journal of developmental biology, 52 5-6:683-701, Jan 2008. URL: https://doi.org/10.1387/ijdb.082580jh, doi:10.1387/ijdb.082580jh. This article has 68 citations.

  8. (sawada2022mechanismsofsperm–egg pages 1-2): Hitoshi Sawada and Takako Saito. Mechanisms of sperm–egg interactions: what ascidian fertilization research has taught us. Cells, 11:2096, Jul 2022. URL: https://doi.org/10.3390/cells11132096, doi:10.3390/cells11132096. This article has 18 citations.

  9. (gomisruth2023structuralandevolutionary pages 2-5): F. Xavier Gomis-RΓΌth and Walter StΓΆcker. Structural and evolutionary insights into astacin metallopeptidases. Frontiers in Molecular Biosciences, Jan 2023. URL: https://doi.org/10.3389/fmolb.2022.1080836, doi:10.3389/fmolb.2022.1080836. This article has 24 citations.

Artifacts

Citations

  1. lin2023thebiologyof pages 4-5
  2. baum2020cubandsushi pages 1-4
  3. gonzalezcalvo2022synapseformationand pages 1-2
  4. akyuz2022thediverserole pages 1-3
  5. hedrick2008anuranandpig pages 2-3
  6. hedrick2008anuranandpig pages 3-4
  7. gomisruth2023structuralandevolutionary pages 2-5
  8. https://doi.org/10.3389/fmolb.2022.1080836
  9. https://doi.org/10.1016/j.jlr.2024.100664
  10. https://doi.org/10.3389/fnins.2022.866444
  11. https://doi.org/10.1186/s12929-023-00925-3
  12. https://doi.org/10.1101/2020.09.11.291427
  13. https://doi.org/10.1387/ijdb.082580jh
  14. https://doi.org/10.3390/cells11132096
  15. https://doi.org/10.3389/fmolb.2022.1080836](https://doi.org/10.3389/fmolb.2022.1080836
  16. https://doi.org/10.1016/j.jlr.2024.100664](https://doi.org/10.1016/j.jlr.2024.100664
  17. https://doi.org/10.3389/fnins.2022.866444](https://doi.org/10.3389/fnins.2022.866444
  18. https://doi.org/10.1186/s12929-023-00925-3](https://doi.org/10.1186/s12929-023-00925-3
  19. https://doi.org/10.1101/2020.09.11.291427](https://doi.org/10.1101/2020.09.11.291427
  20. https://doi.org/10.1387/ijdb.082580jh](https://doi.org/10.1387/ijdb.082580jh
  21. https://doi.org/10.3390/cells11132096](https://doi.org/10.3390/cells11132096
  22. https://doi.org/10.3389/fnins.2022.866444,
  23. https://doi.org/10.1016/j.jlr.2024.100664,
  24. https://doi.org/10.1186/s12929-023-00925-3,
  25. https://doi.org/10.3390/genes13122332,
  26. https://doi.org/10.1101/2020.09.11.291427,
  27. https://doi.org/10.1387/ijdb.082580jh,
  28. https://doi.org/10.3390/cells11132096,
  29. https://doi.org/10.3389/fmolb.2022.1080836,

OpenScientist

(A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md)
Deep Research Report: GO:0005201 as Core Function of A0A2G9RZF1 OpenScientist openscientist-autonomous 13 citations 6 artifacts 2026-06-22T00:46:43.168118 citations file

Deep Research Report: GO:0005201 as Core Function of A0A2G9RZF1

Executive Judgment

Verdict: Over-annotated (Refuted)

The hypothesis that extracellular matrix structural constituent (GO:0005201) is a core function of A0A2G9RZF1 is refuted by three convergent and independent lines of evidence:

  1. Semantic mismatch: GO:0005201 describes proteins that contribute to the structural integrity of the ECM (collagens, elastin, fibrillin) β€” not CUB-domain interaction modules that mediate protein-protein recognition. CUB domains are 110-residue Ξ²-sandwich folds functioning as modular binding surfaces (PMID: 21954942), fundamentally distinct from structural ECM scaffolds.

  2. FunFam evidence chain failure: The PCPE-1 FunFam classification that motivated this annotation is doubly flawed. PCPE-1's experimentally validated function is peptidase activator activity (GO:0016504, IDA), not ECM structural support. Its single GO:0005201 annotation derives from a proteomics cataloging study (RCA evidence from PMID: 20551380), and individual CUB domains cannot perform PCPE-1's enhancing function β€” which requires cooperative CUB1+CUB2 binding with >1,000-fold higher affinity than single CUB domains (PMID: 19801683).

  3. Gene prediction artifact: A0A2G9RZF1 is almost certainly a gene prediction fragment from a poorly assembled genome. Scaffold KV928989 encodes three consecutive ORFs whose combined domain architecture (CUB–EGF–CUB–CUB–EGF–CUB) matches the C-terminal region of a BMP-1/tolloid-like metalloprotease. A chromosome-level assembly published in 2024 (GCF_042186555.1) encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa) β€” full-length tolloid-family metalloproteases that the 156-aa fragment represents a portion of.

The molecular function of A0A2G9RZF1 should be left unassigned. No GO annotation should be applied to this fragment. If the complete gene product were annotated, its molecular function would be metalloendopeptidase activity β€” not ECM structural support.


Summary

A0A2G9RZF1 is a 156-amino acid unreviewed (TrEMBL) protein from Aquarana catesbeiana (American bullfrog), predicted at protein existence level 4 (predicted) from whole-genome shotgun sequencing. Its sole recognizable feature is a single CUB domain (residues 31–147). The seed hypothesis proposed that this protein functions as an extracellular matrix structural constituent (GO:0005201), based on (a) the general extracellular localization of CUB-domain proteins, and (b) FunFam classification to PCPE-1 (Procollagen C-endopeptidase enhancer 1), a protein detected in ECM proteomics datasets.

This investigation systematically evaluated three critical questions: (1) Is GO:0005201 the correct term for what CUB domains do? (2) Does the PCPE-1 FunFam classification justify GO:0005201? (3) Is A0A2G9RZF1 a real, complete gene product? The answer to all three is no. CUB domains are protein-protein interaction modules, not structural ECM components. PCPE-1 is experimentally a peptidase activator, and its single GO:0005201 annotation derives from a low-confidence computational analysis. Most importantly, A0A2G9RZF1 appears to be a gene prediction fragment from a fragmented genome assembly, representing one CUB domain from a much larger (~1,000 aa) tolloid-family metalloprotease.

These findings were confirmed across three iterations of investigation, incorporating analysis of 32 primary literature papers, genomic context examination, and cross-assembly validation. The conclusion is robust: GO:0005201 is inappropriate for this protein at every level of analysis.


Key Findings

Finding 1: GO:0005201 Is Semantically Inappropriate for CUB-Domain Proteins

GO:0005201 (extracellular matrix structural constituent) is defined as "The action of a molecule that contributes to the structural integrity of the extracellular matrix." Examination of QuickGO annotations reveals this term is overwhelmingly applied to bona fide structural ECM proteins: collagens (COL1A1, COL4A1-6, COL5A2), elastin (ELN), fibrillin, and similar molecules that physically constitute the ECM scaffold. These are large, repetitive structural proteins whose presence is necessary for the mechanical and organizational properties of the matrix.

CUB domains, by contrast, are 110-residue protein motifs that adopt a Ξ²-sandwich fold and mediate protein-protein interactions. As established in the authoritative review by Gaboriaud et al. (PMID: 21954942): CUB domains are "110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions in various extracellular proteins." They function as modular binding surfaces β€” recognition and interaction elements β€” not as structural building blocks of the ECM. Annotating a single-CUB-domain protein with GO:0005201 conflates a protein-protein interaction module with a structural matrix component, which are fundamentally different molecular functions.

Finding 2: PCPE-1's Validated Function Is Peptidase Activator Activity, Not ECM Structural Support

The FunFam classification that linked A0A2G9RZF1 to PCPE-1 was the primary justification for considering GO:0005201. However, examining PCPE-1's actual experimental annotations reveals a critical mismatch. Human PCPE-1 (UniProt Q15113) has been experimentally characterized with the following GO molecular functions supported by IDA (Inferred from Direct Assay) evidence from PMID: 12393877:

  • Collagen binding (GO:0005518)
  • Heparin binding (GO:0008201)
  • Peptidase activator activity (GO:0016504)

PCPE-1's single GO:0005201 annotation is supported only by RCA (Reviewed Computational Analysis) evidence from a proteomics study of human aortic ECM (PMID: 20551380), classified by the BHF-UCL curation group. This is a computational classification based on finding the protein in an ECM proteomics dataset β€” it does not demonstrate that PCPE-1 structurally contributes to the ECM.

As stated by Berry et al. (PMID: 30078642): "Procollagen C-proteinase enhancer-1 (PCPE-1) is a secreted protein that specifically accelerates proteolytic release of the C-propeptides from fibrillar procollagens, a crucial step in fibril assembly." This is a regulatory/catalytic enhancement function, fundamentally different from structural ECM support. Even if A0A2G9RZF1 were a genuine PCPE-1 ortholog (which it is not), GO:0016504 (peptidase activator activity), not GO:0005201, would be the appropriate molecular function term.

Finding 3: A Single CUB Domain Cannot Perform PCPE-1 Function

Even setting aside the semantic mismatch with GO:0005201, transferring any PCPE-1 function to A0A2G9RZF1 is unjustified because a single CUB domain is insufficient. PCPE-1 is a 449-amino acid protein with three functional domains: CUB1 (37–149), CUB2 (159–273), and NTR (318–437). Both CUB domains and the NTR domain contribute to PCPE-1's biological activities.

Critical experimental evidence from Blanc et al. (PMID: 19801683) demonstrated conclusively that "only those containing both CUB1 and CUB2 were capable of enhancing BMP-1 activity and binding to a mini-procollagen substrate with nanomolar affinity. Both these properties were lost by individual CUB domains, which had dissociation constants at least three orders of magnitude higher." This >1,000-fold loss of binding affinity means that a single CUB domain fragment cannot perform any of PCPE-1's characterized functions. The FunFam classification, while computationally valid at the domain level, does not support functional transfer to a protein containing only one of the required domains.

Bourhis et al. (PMID: 17446170) further confirmed that "Procollagen C-proteinase enhancers (PCPE-1 and -2) are extracellular glycoproteins that can stimulate the C-terminal processing of fibrillar procollagens by tolloid proteinases such as bone morphogenetic protein-1. They consist of two CUB domains (CUB1 and -2) that alone account for PCPE-enhancing activity and one C-terminal NTR domain." A0A2G9RZF1 at 156 amino acids possesses none of this required architecture.

Finding 4: A0A2G9RZF1 Is a Gene Prediction Fragment of a BMP-1/Tolloid-Like Metalloprotease

The most decisive finding emerged from examining the genomic context. Scaffold KV928989 (134 kb) from the RCv2.1 assembly encodes three consecutive open reading frames:

ORF UniProt ID Length Domains PANTHER Classification
AB205_0007200 A0A2G9RZF1 156 aa CUB OVOCHYMASE-RELATED
AB205_0007210 A0A2G9RZH1 228 aa EGF + CUB METALLOENDOPEPTIDASE (SF45)
AB205_0007220 A0A2G9RZI6 250 aa CUB + EGF + CUB BONE MORPHOGENETIC PROTEIN 1 (SF53), Fragment

The combined domain modules from these three ORFs (CUB – EGF – CUB – CUB – EGF – CUB) match the C-terminal region of a tolloid-like (mTLD) protease architecture. The neighboring ORF A0A2G9RZI6 is explicitly marked as "Fragment" in UniProt and classified by PANTHER as BMP-1. The scaffold contains multiple assembly gaps. No complete (>700 aa) BMP-1/tolloid-like protein was found elsewhere in the bullfrog proteome from this assembly, consistent with the gene being split across gaps.

{{figure:evidence_summary.png|caption=Summary of evidence against GO:0005201 annotation: domain architecture comparison between A0A2G9RZF1 (156 aa, single CUB domain) and PCPE-1 (449 aa, CUB1-CUB2-NTR), scaffold fragmentation context showing three consecutive ORFs matching tolloid-like architecture, and assembly quality metrics demonstrating gene model fragmentation}}

Finding 5: CUB Domains in Tolloid Proteases Function as Substrate-Recognition Modules

If A0A2G9RZF1's CUB domain is part of a tolloid-like protease, its function would be substrate recognition and presentation β€” not ECM structural support. Lee et al. (PMID: 18664565) demonstrated in Xenopus Xolloid that "the first and second CUB domains bind Chordin and present it to the protease domain." This substrate-recognition function is the canonical role of CUB domains in the tolloid/BMP-1 protease family and is fundamentally distinct from structural ECM support.

Additional evidence from Drosophila tolloid (PMID: 25642644) showed that N-terminal CUB domains interact with Collagen IV to enhance Tolloid activity toward its substrate Sog, while C-terminal CUB domains mediate Sog interaction. This bipartite CUB domain function (ECM anchoring + substrate binding) fine-tunes protease activity but is entirely regulatory, not structural.

Finding 6: Chromosome-Level Assembly Confirms Gene Fragmentation

The definitive confirmation came from the chromosome-level assembly GCF_042186555.1 (ASM4218655v1, 2024), which encodes:

  • Complete BMP-1: XP_073478370.1, 1,020 aa, 16 exons on LG03
  • Complete TLL1: XP_073462190.1, 1,005 aa, 21 exons on LG01

These are full-length tolloid-family metalloproteases with the expected multi-domain architecture. The original RCv2.1 assembly had a scaffold N50 of only 39,368 bp compared to 691,824,178 bp in the chromosome-level assembly β€” a ~17,000-fold improvement in contiguity. The three fragmented ORFs on scaffold KV928989 (156 + 228 + 250 = 634 aa combined) represent portions of one of these ~1,000 aa genes, with the N-terminal protease domain and additional domains falling in assembly gaps.

This finding is critical for curation: the UniProt proteome (UP000228934) still references the 2017 draft assembly. When updated to the chromosome-level assembly, these fragment entries should be superseded by complete gene models.


Evidence Matrix

# Citation Evidence Type Direction Claim Tested Key Finding Context Confidence
1 PMID: 21954942 Structural/evolutionary review Refutes GO:0005201 CUB domains are structural ECM constituents CUB domains are "110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions" β€” not structural scaffolds Comprehensive review of CUB domain biology High
2 PMID: 19801683 Direct assay (binding, activity) Refutes PCPE-1 transfer Single CUB domain can perform PCPE-1 function Individual CUB domains have >1,000Γ— lower affinity; both CUB1+CUB2 required Recombinant PCPE-1 constructs, in vitro High
3 PMID: 30078642 Structural (crystallography) Qualifies PCPE-1 function PCPE-1 is an ECM structural protein PCPE-1 "specifically accelerates proteolytic release of the C-propeptides" β€” regulatory, not structural X-ray crystallography, human PCPE-1 High
4 PMID: 17446170 Direct assay (mutagenesis, SPR) Refutes PCPE-1 transfer PCPE-1 domain requirements CUB1+CUB2+NTR architecture required for full function Mutagenesis, SPR, activity assays High
5 PMID: 12393877 Direct assay (IDA) Supports alternative MF PCPE-1 molecular function Collagen binding, heparin binding, peptidase activator activity (all IDA) Human PCPE-1, in vitro High
6 PMID: 18664565 Direct assay (in vivo) Supports tolloid interpretation CUB function in tolloid proteases CUB domains "bind Chordin and present it to the protease domain" β€” substrate recognition Xenopus Xolloid, embryos High
7 PMID: 25642644 Direct assay (in vivo) Qualifies CUB function CUB domains in Drosophila tolloid N-terminal CUBs interact with Collagen IV; C-terminal CUBs mediate substrate binding Drosophila embryo High
8 PMID: 10500163 Direct assay Qualifies PANTHER family Ovochymase-related family Ovochymase polyprotein has 5 CUB domains between 3 protease domains; diverse family Xenopus laevis eggs Moderate
9 Scaffold KV928989 analysis Computational/genomic Refutes gene completeness A0A2G9RZF1 is a complete gene Three consecutive ORFs with complementary tolloid-like domains; assembly gaps A. catesbeiana RCv2.1 Medium-High
10 GCF_042186555.1 (2024) Computational/genomic Confirms fragmentation Complete tolloid genes exist BMP-1 (1,020 aa) and TLL1 (1,005 aa) are full-length in chromosome-level assembly A. catesbeiana chromosome-level High
11 PMID: 24117177 Direct assay (SPR) Qualifies PCPE-1 role PCPE-1 has additional ECM partners 17 new binding partners; CUB1CUB2 fragment inhibits angiogenesis β€” regulatory, not structural SPR imaging, in vitro Moderate
12 PMID: 16819821 Structural/biophysical Supports CUB = binding Single-CUB proteins exist Spermadhesins (PSP-I, PSP-II) are single CUB domain proteins functioning as binders, not ECM structural Boar seminal plasma Moderate
13 PMID: 20551380 Computational (proteomics) Source of PCPE-1's GO:0005201 PCPE-1 in ECM fraction Detection in ECM proteomics β‰  structural function; basis for weak RCA annotation Human aortic tissue Low for functional inference
14 UniProt Q15113 Database record Refutes GO:0005201 as PCPE-1 core PCPE-1 GO annotations IDA-supported: GO:0005518, GO:0016504, GO:0008201. GO:0005201 is only RCA (computational) Human PCPE-1 High

GO Curation Implications

GO:0005201 (extracellular matrix structural constituent) should not be applied to A0A2G9RZF1 under any evidence code. The term is semantically incorrect for a CUB-domain protein, the evidence chain through PCPE-1 FunFam classification does not support it even for PCPE-1 itself, and the protein is a gene prediction artifact.

GO Decision Table:

GO Term Ontology Action Rationale Evidence Level
GO:0005201 (ECM structural constituent) MF Remove / Do not apply Semantic mismatch; CUB β‰  structural ECM; weak RCA source Refuted at multiple levels
GO:0016504 (peptidase activator activity) MF Do not apply Requires CUB1+CUB2+NTR architecture; protein is a fragment Not transferable to single CUB
GO:0004222 (metalloendopeptidase activity) MF Do not apply (to fragment) No protease domain in fragment; correct for complete gene Correctly rejected by seed
GO:0005515 (protein binding) MF Avoid Uninformative; discouraged by GO guidelines Would be technically defensible but not useful
GO:0005576 (extracellular region) CC Retain with IEA CUB domains are nearly exclusively extracellular Moderate; domain-based inference
MF unassigned MF Recommended Fragment with no experimental data; function genuinely unknown β€”

Key Distinction

Even if A0A2G9RZF1 were a standalone protein (which genomic evidence refutes), the appropriate MF for a single-CUB protein would relate to protein binding in the extracellular space β€” never GO:0005201. The seed hypothesis itself correctly notes that "the short length (156 aa) and absence of additional functional domains preclude confident functional assignment." This assessment is confirmed and strengthened by our analysis.


Mechanistic Scope

Direct Gene-Product Activity Under Test

The hypothesis tests whether A0A2G9RZF1 directly functions as a structural component of the extracellular matrix β€” physically contributing to ECM architecture by being incorporated into the matrix scaffold, analogous to collagens, proteoglycans, or elastin.

What the CUB Domain Actually Does

CUB domains are non-catalytic protein-protein interaction modules. In the tolloid-family protease context (the most likely identity for the complete gene), CUB domains function as:

  1. Substrate-recognition modules β€” binding target proteins (e.g., Chordin, procollagen C-propeptides) and presenting them to the catalytic protease domain (PMID: 18664565)
  2. ECM-anchoring modules β€” interacting with Collagen IV to localize the protease (PMID: 25642644)

Neither function constitutes "structural ECM support."

Separation of Activities

Level Activity Applies to A0A2G9RZF1?
Direct MF ECM structural support (GO:0005201) No β€” CUB domains do not structurally constitute the ECM
Direct MF Protein binding / substrate recognition Plausible for the CUB domain, but fragment status precludes annotation
Direct MF Metalloendopeptidase activity No β€” no protease domain in this fragment
Indirect Collagen fibril assembly regulation Only via complete tolloid protease
Indirect BMP signaling modulation Only via complete tolloid protease
Downstream Corneal scarring, fibrosis Pathway-level phenotypes, not direct MF

Conflicts and Alternatives

1. FunFam Classification to PCPE-1

The CATH FunFam classification (2.60.120.290:FF:000005) groups A0A2G9RZF1 with PCPE-1. While computationally valid at the domain-fold level, this classification does not imply functional equivalence. PCPE-1 requires CUB1 + CUB2 + NTR for its characterized activities. The FunFam grouping reflects structural similarity of the CUB fold, not functional annotation transfer. CUB domains are found across functionally diverse proteins: complement factors (C1r, C1s), endocytic receptors (cubilin β€” 27 CUB domains; PMID: 30295181), spermadhesins, developmental proteases, neurotransmitter receptor modulators (PMID: 21093502), and innate immune receptors.

PANTHER classifies A0A2G9RZF1 under PTHR24251 (OVOCHYMASE-RELATED). Ovochymase is a Xenopus egg polyprotein with multiple CUB domains interspersed between serine protease domains (PMID: 10500163). This family-level classification is broad and includes proteins with diverse functions (metalloproteases, serine proteases, enhancers). It should not be used to infer a specific molecular function for a single-CUB fragment.

3. Could A0A2G9RZF1 Be a Genuine Single-CUB Protein?

This alternative is unlikely but not absolutely excludable. Single CUB-domain proteins exist (e.g., spermadhesins PSP-I/PSP-II in pig seminal plasma; PMID: 16819821). However, arguments against this interpretation are strong:

  • Protein existence level 4 (predicted, no transcript/protein evidence)
  • TrEMBL (unreviewed) status
  • Genomic context: three consecutive ORFs with complementary domains on a gap-containing scaffold
  • No complete tolloid gene found elsewhere in the same assembly
  • Chromosome-level assembly encodes full-length versions (1,005–1,020 aa)
  • ~17,000-fold improvement in assembly contiguity resolved the fragmentation

Even if A0A2G9RZF1 were standalone, its function would be extracellular protein binding β€” still NOT GO:0005201.

4. Paralog/Family Confusion Risk

The PTHR24251 family includes ovochymase (egg protease with 5 CUB domains), BMP-1/tolloid metalloproteases, PCPE-1/2 enhancers, and various uncharacterized CUB-containing proteins. Homology at the domain level does not predict function at the protein level. The FunFam classification likely reflects the general CUB fold similarity rather than specific PCPE-1 functional identity.


Knowledge Gaps

Gap What Was Checked Why It Matters What Would Resolve It
No experimental data for A0A2G9RZF1 UniProt (PE4), QuickGO (0 annotations), PubMed (0 hits) Cannot validate any functional annotation without direct evidence Recombinant expression, binding assays, localization studies
No transcript evidence UniProt protein existence level (PE4) Cannot confirm that the predicted ORF is expressed RNA-seq from A. catesbeiana tissues mapped to chromosome-level assembly
Exact gene identity of fragment Scaffold context, PANTHER classifications, domain architecture comparison Know it's part of a tolloid gene, but not which one (BMP-1 on LG03 or TLL1 on LG01) BLAST of scaffold KV928989 against chromosome-level assembly
PCPE-1 ortholog status in bullfrog Whether a genuine PCPE-1 ortholog exists separately If no PCPE-1 ortholog exists, the FunFam classification is even more clearly misleading Reciprocal best BLAST of human PCPE-1 against chromosome-level proteome
UniProt proteome update UP000228934 still references 2017 draft assembly Fragment entries persist until proteome is updated NCBI/UniProt proteome refresh to GCF_042186555.1

Discriminating Tests

Computational (Immediately Feasible)

  1. Scaffold-to-chromosome alignment: BLAST or minimap2 alignment of scaffold KV928989 against GCF_042186555.1 would definitively identify which complete gene (BMP-1 or TLL1) corresponds to the A0A2G9RZF1 locus. Cost: minutes of computation.

  2. PCPE-1 ortholog search: Reciprocal best BLAST of human PCPE-1 (Q15113) against the chromosome-level bullfrog proteome would identify whether a genuine PCPE-1 ortholog exists, independent of A0A2G9RZF1.

  3. RNA-seq validation: Search SRA/ENA for A. catesbeiana transcriptome data and map to the chromosome-level assembly to confirm expression of the complete tolloid gene.

  4. AlphaFold/ESMFold: Structure prediction for A0A2G9RZF1 to assess whether it folds into a stable CUB domain with intact calcium-binding site β€” informative about whether even the fragment is structurally viable.

Experimental (Definitive)

  1. BMP-1 enhancer activity assay: Test whether the isolated CUB domain can enhance BMP-1 activity on procollagens. Expected result: negative, based on PMID: 19801683.

  2. SPR/ITC binding assays: Test binding of the isolated CUB domain to known tolloid substrates (Chordin, procollagen C-propeptide) and ECM structural proteins. Would directly distinguish substrate-recognition from structural roles.

  3. Full-length gene cloning: Clone the complete BMP-1/TLL1 gene from A. catesbeiana cDNA and characterize its enzymatic activity.


Curation Leads

Lead 1: Remove GO:0005201 from Core Functions (HIGH PRIORITY)

Action: Do not annotate A0A2G9RZF1 with GO:0005201. Remove this term from consideration as a core function.

Rationale: Three independent evidence lines refute this annotation (semantic mismatch, PCPE-1 function mismatch, gene fragmentation). No experimental evidence supports it.

References to verify:
- PMID: 19801683 β€” "Out of all the forms tested, only those containing both CUB1 and CUB2 were capable of enhancing BMP-1 activity"
- PMID: 21954942 β€” "CUB domains are 110-residue protein motifs exhibiting a Ξ²-sandwich fold and mediating protein-protein interactions"
- UniProt Q15113 GO annotations β€” PCPE-1 IDA terms are GO:0005518, GO:0008201, GO:0016504 (not GO:0005201)

Confidence: High.

Lead 2: Flag A0A2G9RZF1 as Gene Prediction Fragment (HIGH PRIORITY)

Action: Flag A0A2G9RZF1, A0A2G9RZH1, and A0A2G9RZI6 as probable fragments of a single BMP-1/tolloid-like gene. Do not annotate fragments with function-level GO terms.

Supporting evidence:
- Three consecutive ORFs on scaffold KV928989 with complementary tolloid-family domains
- Scaffold N50 of 39,368 bp in original assembly
- Chromosome-level assembly encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)
- A0A2G9RZI6 explicitly marked as "Fragment" and classified as BMP-1

Reference to verify: PMID: 18664565 β€” CUB domains in tolloid = substrate recognition modules, not structural ECM

Confidence: High.

Lead 3: Confirm Rejection of Metalloendopeptidase Activity (MODERATE)

Action: Confirm the seed hypothesis's recommendation against annotating metalloendopeptidase activity (GO:0004222) for this fragment. The protease domain is absent from A0A2G9RZF1.

Note: If the complete gene product is identified as BMP-1 or TLL1, GO:0004222 would be appropriate for the full-length protein β€” but not for this CUB-domain-only fragment.

Confidence: High.

Lead 4: Retain GO:0005576 as CC Annotation (MODERATE)

Action: Retain GO:0005576 (extracellular region) as a Cellular Component annotation with IEA-level evidence, based on the near-universal extracellular localization of CUB-domain proteins.

Confidence: Moderate.

Lead 5: Leave Molecular Function Unassigned (HIGH PRIORITY)

Action: Do not assign any specific MF term. The function is genuinely unknown for this fragment. The original seed hypothesis statement β€” "The precise molecular function of A0A2G9RZF1 is unknown" β€” is the most accurate assessment and should be retained.

Confidence: High.

Lead 6: Reference Complete Gene Models for Future Curation (INFORMATIONAL)

Action: When the UniProt proteome is updated to the chromosome-level assembly (GCF_042186555.1), A0A2G9RZF1 should be superseded by:
- BMP-1: XP_073478370.1 (1,020 aa, 16 exons, LG03) β€” NCBI Gene 141133120
- TLL1: XP_073462190.1 (1,005 aa, 21 exons, LG01) β€” NCBI Gene 141113146

These complete gene products would carry appropriate metalloendopeptidase and substrate-binding annotations.


Mechanistic Model

The following diagram summarizes the evidence architecture:

SEED HYPOTHESIS CHAIN (refuted at each step):

  A0A2G9RZF1 ──FunFam──> PCPE-1 ──RCA──> GO:0005201     
  (156 aa,        ↓         ↓              ↓              
   1 CUB)     FOLD MATCH  WEAK RCA    SEMANTIC MISMATCH   
          ↓         ↓              ↓              
      Single CUB  IDA terms    GO:0005201 = 
      cannot do    are NOT      collagens,        
      PCPE-1 work  GO:0005201   elastin           

ACTUAL IDENTITY (supported):

  Scaffold KV928989 (fragmented assembly, N50=39kb):
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ ...gap... [CUB] ...gap... [EGF-CUB] ...gap...  β”‚
  β”‚           ↑                ↑                    β”‚
  β”‚     A0A2G9RZF1      A0A2G9RZH1                 β”‚
  β”‚                                                 β”‚
  β”‚ [CUB-EGF-CUB] ...gap...                        β”‚
  β”‚       ↑                                         β”‚
  β”‚  A0A2G9RZI6 (PANTHER: BMP-1, "Fragment")        β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                ↓ resolves to
  Chromosome-level assembly (N50=692Mb):
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ BMP-1 (1020 aa): Protease-CUB-EGF-CUB-CUB-EGF β”‚
  β”‚ or                                              β”‚
  β”‚ TLL1 (1005 aa): Protease-CUB-EGF-CUB-CUB-EGF  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Interpretation: A0A2G9RZF1 is one CUB domain from a tolloid-family metalloprotease that was incorrectly predicted as a standalone gene due to assembly fragmentation. The CUB domain's function within the complete protein would be substrate recognition β€” binding target proteins and presenting them to the protease domain for cleavage. This is a regulatory/binding function within a larger enzyme, not an ECM structural role.


Evidence Base: Key Literature

Core Evidence Against GO:0005201

Gaboriaud et al. (2011) β€” Structure and properties of the Ca(2+)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions. PMID: 21954942
Comprehensive review establishing CUB domains as protein-protein interaction modules with Ξ²-sandwich fold. Key evidence: CUB domains mediate ligand recognition across diverse biological systems (complement, development, endocytosis, reproduction) β€” they are interaction modules, not structural components.

Blanc et al. (2009) β€” Strong cooperativity and loose geometry between CUB domains are the basis for procollagen C-proteinase enhancer activity. PMID: 19801683
Demonstrated that individual CUB domains lose >1,000-fold binding affinity. Critical for ruling out functional transfer from PCPE-1 to single-CUB proteins. This is the strongest single piece of evidence against the PCPE-1-based inference chain.

Bourhis et al. (2007) β€” Insights into how CUB domains can exert specific functions while sharing a common fold. PMID: 17446170
Confirmed the CUB1-CUB2-NTR architecture requirement for PCPE function. Identified PCPE-specific residues in CUB1 that are necessary but insufficient without CUB2 cooperativity.

Evidence for Tolloid Fragment Identity

Lee et al. (2009) β€” Molecular determinants of Xolloid action in vivo. PMID: 18664565
Demonstrated CUB domain function in amphibian tolloid proteases: substrate recognition and presentation to the protease domain. Directly relevant as Xenopus is the closest well-characterized amphibian model.

Winstanley et al. (2015) β€” Synthetic enzyme-substrate tethering obviates the Tolloid-ECM interaction during Drosophila BMP gradient formation. PMID: 25642644
Showed bipartite CUB domain function in tolloid: ECM anchoring (N-terminal CUBs) and substrate binding (C-terminal CUBs). Confirms CUB domains in tolloids serve regulatory/binding, not structural, roles.

PCPE-1 Function Characterization

Berry et al. (2018) β€” Structural Basis for the Acceleration of Procollagen Processing by Procollagen C-Proteinase Enhancer-1. PMID: 30078642)
Crystal structure of PCPE-1 revealing its mechanism as a regulatory accelerator of procollagen processing β€” not an ECM structural protein.

Salza et al. (2014) β€” Extended interaction network of procollagen C-proteinase enhancer-1 in the extracellular matrix. PMID: 24117177
Identified 17 binding partners of PCPE-1, confirming its role as an interaction hub, not structural scaffold. CUB1CUB2 fragment inhibits angiogenesis β€” a regulatory function.


Limitations

  1. No direct experimental evidence exists for A0A2G9RZF1. All conclusions are based on computational analysis, domain architecture reasoning, genomic context, and inference from well-characterized homologs in other species.

  2. The fragment-to-gene assignment is inferential. We have not performed the BLAST alignment of KV928989 against the chromosome-level assembly that would definitively identify which gene (BMP-1 or TLL1) the fragment belongs to.

  3. Assembly-based reasoning has inherent uncertainty. While the evidence is strong that A0A2G9RZF1 is a fragment, we cannot exclude unusual genomic rearrangements or lineage-specific gene fission events in bullfrog, though these would be extraordinary claims requiring extraordinary evidence.

  4. Literature is from model organisms. CUB domain function, PCPE-1 biochemistry, and tolloid protease characterization are primarily from human, mouse, Xenopus, and Drosophila. Direct evidence from Aquarana catesbeiana is absent.

  5. The 32 papers reviewed focused on PCPE-1 and tolloid biology. We did not exhaustively survey all possible functions of isolated CUB domains in non-model amphibians, though no evidence from the reviewed literature suggested an ECM structural role for any CUB-domain protein.


Proposed Follow-up Actions

Immediate (Computational)

  1. Scaffold alignment: BLAST scaffold KV928989 against the chromosome-level assembly to identify the exact gene correspondence (BMP-1 or TLL1).
  2. Proteome refresh: Request or monitor UniProt proteome update for A. catesbeiana to the chromosome-level assembly.

Short-term (Curation)

  1. Remove GO:0005201: Remove this term from consideration for A0A2G9RZF1 and any derived annotation pipelines.
  2. Flag as fragment: Mark A0A2G9RZF1 as a gene prediction fragment in the curation system.
  3. No MF annotation: Leave molecular function unassigned on this entry.

Medium-term (Analysis)

  1. Annotate complete genes: Apply appropriate annotations (GO:0004222 metalloendopeptidase activity, GO:0006508 proteolysis, GO:0005576 extracellular region) to the complete BMP-1 (XP_073478370.1) and TLL1 (XP_073462190.1) from the chromosome-level assembly.
  2. PCPE-1 ortholog identification: Search the chromosome-level proteome for genuine PCPE-1/PCOLCE1 orthologs and annotate appropriately with GO:0016504.

Report generated by autonomous scientific discovery agent across 3 investigation iterations. 32 papers reviewed, 6 findings confirmed. All conclusions are computational and require curator verification.

Artifacts

πŸ“„ View Raw YAML

id: A0A2G9RZF1
gene_symbol: A0A2G9RZF1
product_type: PROTEIN
status: DRAFT
taxon:
  id: NCBITaxon:8400
  label: Aquarana catesbeiana
description: >-
  A0A2G9RZF1 is a gene prediction fragment from the fragmented draft genome
  assembly of Aquarana catesbeiana (American bullfrog). The 156-amino acid ORF
  (AB205_0007200) encodes a single CUB domain (residues 31-147), but scaffold
  KV928989 contains two additional consecutive ORFs (A0A2G9RZH1 with EGF+CUB
  domains, and A0A2G9RZI6 with CUB+EGF+CUB domains, the latter explicitly
  flagged as a fragment by UniProt) whose combined domain architecture
  (CUB-EGF-CUB-CUB-EGF-CUB) matches the C-terminal region of a tolloid-family
  metalloprotease (BMP-1 or TLL1). A 2024 chromosome-level assembly
  (GCF_042186555.1) with ~17,000-fold better contiguity encodes complete BMP-1
  (XP_073478370.1, 1,020 aa, LG03) and TLL1 (XP_073462190.1, 1,005 aa, LG01),
  confirming the original draft produced fragmented gene models. The complete
  gene product is a tolloid-family metalloprotease with metalloendopeptidase
  activity. No experimental data exist for this protein (UniProt protein evidence
  level 4: Predicted).
references:
  - id: PMID:29127278
    title: "The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA"
    reference_review:
      relevance: LOW
      correctness: VERIFIED
      review_notes: >-
        This is the bullfrog genome paper cited by UniProt for the nucleotide sequence.
        It provides the genomic context but no functional characterization of this
        specific gene.
  - id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-deep-research-falcon.md
    title: "Deep research report for A0A2G9RZF1"
    publication_type: DEEP_RESEARCH
    reference_review:
      relevance: MEDIUM
      correctness: UNVERIFIED
      review_notes: >-
        The falcon deep research correctly identifies the protein as a CUB
        domain-containing protein and provides a thorough domain-based functional
        inference from recent literature on CUB domain proteins. No direct literature
        exists for this specific protein, so all functional inferences are based on
        domain architecture and family membership comparisons.
  - id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
    title: "OpenScientist deep research: GO:0005201 as core function of A0A2G9RZF1"
    publication_type: DEEP_RESEARCH
    reference_review:
      relevance: HIGH
      correctness: VERIFIED
      review_notes: >-
        OpenScientist autonomous investigation refuted GO:0005201 assignment on three
        independent grounds: (1) semantic mismatch -- GO:0005201 describes structural
        ECM proteins, not CUB-domain interaction modules; (2) PCPE-1 FunFam evidence
        chain failure -- PCPE-1 real function is peptidase activator activity
        (GO:0016504), and single CUB domains cannot perform PCPE-1 function (requires
        cooperative CUB1+CUB2 per PMID:19801683); (3) A0A2G9RZF1 is a gene prediction
        fragment from a poorly assembled genome -- a 2024 chromosome-level assembly
        (GCF_042186555.1) encodes complete BMP-1 (1,020 aa) and TLL1 (1,005 aa).
  - id: PMID:21954942
    title: "Structure and properties of the Ca(2+)-binding CUB domain, a widespread ligand-recognition unit involved in major biological functions"
    reference_review:
      relevance: HIGH
      correctness: VERIFIED
      review_notes: >-
        Authoritative review of CUB domain biology. Establishes CUB domains as
        protein-protein interaction modules with beta-sandwich fold, not structural
        ECM components. Cited by OpenScientist report.
  - id: PMID:19801683
    title: "Strong cooperativity and loose geometry between CUB domains are the basis for procollagen C-proteinase enhancer activity"
    reference_review:
      relevance: HIGH
      correctness: VERIFIED
      review_notes: >-
        Key experimental paper demonstrating that individual CUB domains lose
        >1,000-fold binding affinity compared to the CUB1+CUB2 pair. Refutes
        functional transfer from PCPE-1 to single-CUB-domain proteins.
existing_annotations: []
core_functions:
  - description: >-
      A0A2G9RZF1 is almost certainly a gene prediction fragment from a poorly
      assembled draft genome, representing one CUB domain from a much larger
      (~1,000 aa) tolloid-family metalloprotease (BMP-1 or TLL1). The 156-amino
      acid ORF cannot be meaningfully annotated with a specific molecular function.
      GO:0005201 (extracellular matrix structural constituent) was previously
      hypothesized based on CATH FunFam classification to PCPE-1, but this was
      refuted by OpenScientist analysis on three grounds: (1) GO:0005201 describes
      structural ECM proteins (collagens, elastin), not CUB-domain interaction
      modules; (2) PCPE-1 real function is peptidase activator activity
      (GO:0016504, IDA), and individual CUB domains cannot perform PCPE-1 function
      (requires cooperative CUB1+CUB2 with >1,000-fold higher affinity than single
      domains); (3) genomic context shows three consecutive fragmented ORFs on
      scaffold KV928989 matching tolloid-like architecture, and a 2024
      chromosome-level assembly (GCF_042186555.1) encodes complete BMP-1 (1,020 aa)
      and TLL1 (1,005 aa). The conservative MF assignment of protein binding
      reflects only that CUB domains are established protein-protein interaction
      modules; the actual function of the complete gene product would be
      metalloendopeptidase activity.
    molecular_function:
      id: GO:0003674
      label: molecular_function
    locations:
      - id: GO:0005576
        label: extracellular region
    knowledge_gaps:
      - gap_statement: >-
          A0A2G9RZF1 is a gene prediction fragment, not a complete protein. Its true
          molecular function cannot be determined from this 156-aa fragment. The
          complete gene product is a tolloid-family metalloprotease (BMP-1 or TLL1)
          with metalloendopeptidase activity, but which specific gene (BMP-1 on LG03
          or TLL1 on LG01 of the chromosome-level assembly) corresponds to this
          fragment has not been determined by sequence alignment.
        boundary: >-
          The fragment contains a single CUB domain (residues 31-147) that mediates
          protein-protein interactions in the extracellular space. Scaffold KV928989
          encodes three consecutive ORFs whose combined domains match the C-terminal
          region of a tolloid-family metalloprotease. A 2024 chromosome-level assembly
          (GCF_042186555.1) confirms complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)
          exist in this species.
        gap_kind:
          - BIOLOGY
        significance: >-
          The fragment status means no GO molecular function annotation should be
          confidently applied to this entry. When the UniProt proteome (UP000228934)
          is updated from the 2017 draft to the 2024 chromosome-level assembly, this
          entry should be superseded by a complete gene model.
    supported_by:
      - reference_id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-hypotheses/core-function-1-go-0005201/openscientist.md
        supporting_text: >-
          [OpenScientist verdict: REFUTED. GO:0005201 is semantically inappropriate
          for CUB-domain proteins, the PCPE-1 FunFam evidence chain fails (PCPE-1
          function is peptidase activator activity, individual CUB domains lose
          >1,000-fold affinity), and A0A2G9RZF1 is a gene prediction fragment from a
          fragmented genome assembly confirmed by a 2024 chromosome-level assembly
          encoding complete BMP-1 (1,020 aa) and TLL1 (1,005 aa)]
      - reference_id: PMID:19801683
        supporting_text: >-
          only those containing both CUB1 and CUB2 were capable of enhancing
          BMP-1 activity and binding to a mini-procollagen substrate with nanomolar
          affinity. Both these properties were lost by individual CUB domains, which had
          dissociation constants at least three orders of magnitude higher
      - reference_id: PMID:21954942
        supporting_text: >-
          CUB domains are 110-residue protein motifs exhibiting a Ξ²-sandwich fold and
          mediating protein-protein interactions in various extracellular proteins
      - reference_id: file:AQUCT/A0A2G9RZF1/A0A2G9RZF1-uniprot.txt
        supporting_text: >-
          [UniProt record: CUB domain at residues 31-147; InterPro IPR000859 CUB_dom;
          Pfam PF00431 CUB; PANTHER PTHR24251 OVOCHYMASE-RELATED; protein existence
          level 4 (Predicted)]
suggested_questions:
  - question: >-
      Which complete tolloid-family gene does this fragment correspond to -- BMP-1
      (XP_073478370.1 on LG03) or TLL1 (XP_073462190.1 on LG01)? A BLAST
      alignment of scaffold KV928989 against the chromosome-level assembly
      (GCF_042186555.1) would definitively resolve this. OpenScientist analysis
      confirmed A0A2G9RZF1 is a gene prediction fragment, but the specific gene
      identity remains undetermined.
  - question: >-
      When will the UniProt proteome for Aquarana catesbeiana (UP000228934) be
      updated from the 2017 draft assembly to the 2024 chromosome-level assembly
      (GCF_042186555.1)? This update would supersede the fragment entries
      (A0A2G9RZF1, A0A2G9RZH1, A0A2G9RZI6) with complete gene models carrying
      appropriate metalloendopeptidase annotations.
  - question: >-
      Does Aquarana catesbeiana have a genuine PCPE-1/PCOLCE1 ortholog separate
      from the tolloid-family metalloproteases? A reciprocal best BLAST of human
      PCPE-1 (Q15113) against the chromosome-level proteome would clarify whether
      the CATH FunFam classification to PCPE-1 reflects any real orthology
      relationship or is purely fold-level similarity.
suggested_experiments:
  - hypothesis: >-
      A0A2G9RZF1 is a fragment of a BMP-1 or TLL1 tolloid-family metalloprotease,
      not a standalone protein.
    description: >-
      Align scaffold KV928989 from the RCv2.1 draft assembly against the
      chromosome-level assembly (GCF_042186555.1) using BLAST or minimap2 to
      definitively map A0A2G9RZF1 to either BMP-1 (XP_073478370.1, LG03) or
      TLL1 (XP_073462190.1, LG01). This computational experiment would resolve
      the remaining ambiguity about gene identity.
    experiment_type: bioinformatics
  - hypothesis: >-
      The complete tolloid-family metalloprotease corresponding to A0A2G9RZF1 has
      metalloendopeptidase activity and functions in procollagen processing or BMP
      signaling regulation.
    description: >-
      Clone the full-length BMP-1 or TLL1 gene from Aquarana catesbeiana cDNA
      using the chromosome-level assembly as reference, express recombinantly, and
      assay metalloendopeptidase activity against procollagen or Chordin
      substrates. Compare with Xenopus Xolloid, the closest well-characterized
      amphibian ortholog.
    experiment_type: biochemistry