| Protein ▼ | Organism ▼ | Predicted Term ▼ | Type ▼ | Assessment ▼ | CS ▼ | Error ▼ | Summary |
|---|---|---|---|---|---|---|---|
|
A0A8C9H4D2
OLFML2A |
Piliocolobus tephrosceles |
GO:0031012 extracellular matrix |
GO_CC | COR | 2 | ProtNLM2 predicted GO:0031012 (extracellular matrix), a cellular component term that is not present in the existing GOA annotations for this protein. The existing CC annotation is GO:0005615 (extracellular space, IEA:TreeGrafter), which is a broader localization. This prediction is assessed as correct and novel based on multiple lines of evidence. OLFML2A is a secreted glycoprotein (UniProt keyword "Secreted", ARBA evidence) containing a C-terminal olfactomedin-like domain (IPR003112, Pfam PF02191). Olfactomedin-family proteins are characteristically matricellular glycoproteins that function within the extracellular matrix rather than simply being soluble in the extracellular space. The AI review of this protein explicitly identifies it as a "matricellular regulatory protein involved in cell-extracellular matrix communication, cell adhesion, and modulation of cell migration," and the core function annotation assigns GO:0005201 (extracellular matrix structural constituent) as the molecular function. ECM localization is the natural compartment for a protein with this domain architecture and functional profile. GO:0031012 (extracellular matrix) is a child of GO:0005576 (extracellular region, the existing GOA annotation from GO_REF:0000044) and provides a more precise and informative localization, making this a genuinely novel and useful prediction not captured by existing annotations. |
|
|
A0A8C9H4D2
OLFML2A |
Piliocolobus tephrosceles |
GO:0030198 extracellular matrix organization |
GO_BP | COR | 2 | ProtNLM2 predicted GO:0030198 (extracellular matrix organization), a biological process term absent from the existing GOA annotations for this protein. The only existing BP annotation is GO:0007165 (signal transduction, IEA:TreeGrafter), which the AI review flagged as an over-annotation because OLFML2A's influence on signaling is indirect, mediated through ECM interactions rather than conventional signal transduction. This prediction is assessed as correct and novel. OLFML2A is a secreted olfactomedin-family glycoprotein whose core function is described as a "matricellular regulator" that "contributes to extracellular matrix organization, cell adhesion, and modulation of cell migration." The protein contains the olfactomedin-like domain (IPR003112), which adopts a five-bladed beta-propeller fold mediating protein-protein interactions in the extracellular space -- a structural basis consistent with a role in organizing ECM components. ECM organization is a well-established biological process for secreted matricellular proteins that modulate the structural and signaling properties of the matrix without being classical structural components like collagens. This prediction captures a biologically informative process that is more specific and accurate than the existing signal transduction annotation, and fills a gap in the current functional annotation of OLFML2A. |
|
| A0A8B8L1Z3 | Abrus precatorius |
GO:0005783 endoplasmic reticulum |
GO_CC | UNC | 1 | FREQUENCY_BIAS | GO:0005783 (endoplasmic reticulum) is predicted for a J domain-containing protein that has no existing GO annotations. While a subset of J-domain (DnaJ/Hsp40) proteins are ER-resident co-chaperones -- notably ERdj1 through ERdj8 in mammals and their plant homologs, which partner with BiP (ER-luminal Hsp70) in nascent chain translocation, protein folding, and ER-associated degradation -- ER-targeted J-domain proteins characteristically possess an N-terminal signal peptide or transmembrane anchor that directs them to the ER membrane or lumen. This protein lacks both: UniProt reports no signal peptide, no transmembrane domain, and the C-terminal sequence (ending ...VGDDKVKGH) contains no KDEL/HDEL ER-retention signal. The domain architecture consists of a single J domain at positions 70-135 followed by large intrinsically disordered regions (residues 138-196, 312-408) with proline-rich, basic/acidic, and polar compositional biases. This disordered-rich architecture is more typical of cytoplasmic or nuclear J-domain proteins involved in transcriptional regulation or chromatin remodeling than of ER-luminal chaperones. Additionally, the InterPro classification IPR053052 (Imprinting Balance Regulator) suggests homology to nuclear/cytoplasmic regulatory proteins rather than ER-resident chaperones. In plants, cytoplasmic and chloroplastic J-domain proteins substantially outnumber ER-targeted ones. Without a signal peptide, transmembrane anchor, or ER-retention motif, there is no sequence-based evidence supporting ER localization. The prediction likely reflects frequency bias in ProtNLM2 training data, where endoplasmic reticulum is over-represented among chaperone-domain-containing proteins due to the well-studied ER protein quality control machinery. |
| A0A6I8TLE4 | Aedes aegypti |
GO:0005096 GTPase activator activity |
GO_MF | CNN | 2 | TRAINING_DATA_CONTAMINATION | GO:0005096 (GTPase activator activity) is an exact match to the existing GOA annotation for this protein (IEA:UniProtKB-KW). The prediction is biologically correct: A0A6I8TLE4 contains a canonical RasGAP domain (IPR001936, Pfam PF00616) with the conserved catalytic arginine finger motif detected by PROSITE PS00509, which is the hallmark of proteins that stimulate the intrinsic GTPase activity of Ras-family small GTPases. The CDD match to cd05136 (RasGAP_DAB2IP) further specifies this as a DAB2IP/SynGAP-subfamily RasGAP, and PANTHER classifies it as PTHR10194:SF60 (raskol), indicating orthology to a well-characterized Drosophila RasGAP. However, because this annotation was already present in GOA via automated keyword mapping (IEA:UniProtKB-KW from the "GTPase activation" keyword), the prediction is classified as CNN (Correct but Not Novel) rather than COR. The sequence features driving the ProtNLM2 prediction are the same ones that generated the existing IEA annotation, indicating likely training data contamination. Notably, ProtNLM2 did not predict any additional GO terms that the multi-domain architecture strongly supports: the C2 domain (cd04013, SynGAP-like) and PH domain (cd13262, SynGAP-like) predict calcium-dependent and phosphoinositide-dependent membrane targeting respectively, and the DAB2P_C domain (IPR021887) suggests scaffold/adaptor functions. These domains collectively point to involvement in Ras protein signal transduction (GO:0007265) and negative regulation thereof (GO:0046580), as well as plasma membrane and cytosol localization -- none of which were captured by the model. |
| A0A2G9RZF1 | Aquarana catesbeiana |
GO:0007338 single fertilization |
GO_BP | UNC | 1 | ProtNLM2 predicted GO:0007338 (single fertilization) for this CUB domain-containing protein. The prediction is biologically plausible: PANTHER classifies this protein in the OVOCHYMASE-RELATED family (PTHR24251), and ovochymases are serine proteases involved in egg envelope hardening during fertilization in amphibians. CUB domains are also found in amphibian egg envelope glycoproteins that mediate sperm-egg recognition (Hedrick 2008). However, A0A2G9RZF1 is only 156 aa (likely a protein fragment), contains a single CUB domain without any identifiable protease or catalytic domain, and UniProt notes it lacks conserved residues required for feature propagation. There are no GOA annotations, no expression data, and no direct experimental evidence linking this specific protein to fertilization. While the family context makes reproductive biology a reasonable hypothesis, the prediction cannot be validated or refuted with available evidence. |
|
| A0A2G9RZF1 | Aquarana catesbeiana |
GO:0005576 extracellular region |
GO_CC | COR | 2 | ProtNLM2 predicted GO:0005576 (extracellular region) for this CUB domain-containing protein. This prediction is well-supported by convergent domain and family evidence. CUB domains are almost exclusively found in secreted or cell-surface proteins that function in the extracellular space (Thomas et al. 2024, Lin et al. 2023, Gonzalez-Calvo et al. 2022). The PANTHER family assignment (PTHR24251, OVOCHYMASE-RELATED) groups this with secreted extracellular proteases. The FunFam classification maps it to Procollagen C-endopeptidase enhancer 1 (PCPE1), which is a well-characterized secreted extracellular glycoprotein. UniProt keywords include Disulfide bond and Zymogen, both consistent with a secreted extracellular protein. No GOA annotations exist for this protein, so this represents a genuinely novel correct prediction rather than a rediscovery of existing annotation. |
|
| A0A444Z7V7 | Arachis hypogaea | No predictions | |||||
| F4JLB7 | Arabidopsis thaliana |
GO:0016310 phosphorylation |
GO_BP | PLI | 0 | PARALOG_OVERANNOTATION | GO:0016310 (phosphorylation) predicts that RIC7 is involved in a phosphorylation process, implying it acts as or with a kinase. This is incorrect. RIC7 is a ROP GTPase effector protein whose characterized biological roles are negative regulation of stomatal opening (GO:1902457) and regulation of stomatal movement (GO:0010119), mediated by binding activated ROP2 via its CRIB domain and inhibiting Exo70B1. RIC7 belongs to the receptor-like protein (RLP) family, which by definition lacks the intracellular kinase domain present in the related receptor-like kinase (RLK) family. The UniProt entry for F4JLB7 lists no kinase-related domains, keywords, or functions. InterPro annotations show only LRR domains (IPR001611, IPR032675) and a stomatal development/plant interaction regulator domain (IPR052941), with no protein kinase domain. The FunFam classification with the ERECTA kinase superfamily (3.80.10.10:FF:000041) reflects shared LRR structural domains, not shared kinase function -- ERECTA is an RLK with a kinase domain, while RIC7 is an RLP without one. No experimental evidence from Wu et al. 2001 (PMID:11752391) or subsequent functional studies supports any role for RIC7 in phosphorylation. This prediction is a paralog overannotation error arising from failure to distinguish kinase-containing from kinase-lacking members of the LRR superfamily. |
| F4JLB7 | Arabidopsis thaliana |
GO:0016301 kinase activity |
GO_MF | PLI | 0 | PARALOG_OVERANNOTATION | GO:0016301 (kinase activity) predicts that RIC7 catalyzes the transfer of a phosphate group to a substrate. This is incorrect. RIC7's characterized molecular function is small GTPase binding (GO:0031267), not kinase activity. The protein functions as a signaling adaptor downstream of ROP2, not as a catalytic enzyme. Structurally, RIC7 contains a CRIB (Cdc42/Rac-interactive binding) motif for GTPase interaction and LRR domains for protein-protein interactions, but no kinase domain of any type. The Pfam annotations for F4JLB7 are exclusively LRR_1 (PF00560, 3 copies) and LRR_8 (PF13855, 1 copy), with no Pkinase (PF00069) or Pkinase_Tyr (PF07714) domains. This distinguishes RIC7 from LRR-RLK proteins like ERECTA (AT2G26330), which share the LRR extracellular domain but additionally possess a cytoplasmic kinase domain. The curated ai-review explicitly notes that RIC7 "lacks a kinase domain and functions as a cytoplasmic effector rather than a receptor." ProtNLM2 likely propagated kinase activity from LRR-RLK superfamily members in its training data to this kinase-lacking RLP family member, a classic Type 6 paralog overannotation error where the model fails to distinguish nonisofunctional members of a protein superfamily. |
| A0A2U1PS28 | Artemisia annua |
GO:0009507 chloroplast |
GO_CC | UNC | 1 | ProtNLM2 predicted chloroplast localization for this plant GUF1 homolog. All existing GOA annotations place this protein in the mitochondrion (GO:0005743 mitochondrial inner membrane, GO:0005759 mitochondrial matrix), based on HAMAP-Rule MF_03137 and UniProtKB-UniRule. However, the PANTHER subfamily classification (PTHR43512:SF4) explicitly labels this protein as "TRANSLATION FACTOR GUF1 HOMOLOG, CHLOROPLASTIC," providing independent support for chloroplast targeting. In plants, the LepA/EF-4 family includes paralogs targeted to different organelles: chloroplastic cpLepA functions in plastid translation while mitochondrial GUF1 operates in mitochondrial translation. Both organelles maintain their own translation machinery and require elongation factor 4 for translational quality control. The N-terminal sequence of A0A2U1PS28 contains features (disordered polar-rich region, residues 1-25) that could constitute either a mitochondrial or chloroplast transit peptide, and distinguishing these computationally is notoriously difficult in plants. The existing AI review explicitly flags the question of whether this protein localizes to mitochondria, chloroplasts, or both as an unresolved issue requiring fluorescent tagging and confocal microscopy with organelle markers. Without such experimental data, this prediction remains genuinely uncertain -- it is biologically plausible given the PANTHER classification and the known diversity of organellar EF-4 targeting in plants, but it contradicts the HAMAP-based annotations that form the basis of the current GOA record. |
|
| Q2U1U6 | Aspergillus oryzae |
GO:0000272 polysaccharide catabolic process |
GO_BP | COR | 2 | GO:0000272 (polysaccharide catabolic process) is a correct novel prediction for Q2U1U6. The protein contains a Chondroitin_lyas domain (IPR008929) and matches the Chondroitin AC/alginate lyase SUPFAM fold (SSF48230), both of which are diagnostic for polysaccharide lyases that depolymerize glycosaminoglycan chains. The deep research report confirms that the protein is predicted to function as a polysaccharide lyase degrading GAG substrates (chondroitin sulfate, dermatan sulfate, and possibly hyaluronic acid) through a beta-elimination mechanism. This activity falls squarely within polysaccharide catabolic process. The term is somewhat broad -- a more precise annotation might be glycosaminoglycan catabolic process (GO:0006027) -- but GO:0000272 is not wrong and represents a biologically meaningful novel prediction for a protein with no existing GOA annotations. Aspergillus species encode extensive CAZyme repertoires including polysaccharide lyases, and the comparative genomics of section Flavi Aspergilli supports the plausibility of such activity in A. oryzae. |
|
| Q2U1U6 | Aspergillus oryzae |
GO:0004553 hydrolase activity, hydrolyzing O-glycosyl compounds |
GO_MF | NPI | 0 | FREQUENCY_BIAS | GO:0004553 (hydrolase activity, hydrolyzing O-glycosyl compounds) is mechanistically incorrect for Q2U1U6. The protein's only domain annotation is Chondroitin_lyas (IPR008929) with a structural match to Chondroitin AC/alginate lyase (SSF48230), placing it firmly in the polysaccharide lyase (PL) superfamily, not the glycoside hydrolase (GH) superfamily. These are fundamentally different enzyme classes in the CAZy classification system. Polysaccharide lyases cleave glycosidic bonds via a beta-elimination mechanism, abstracting a proton from C5 of a hexuronic acid residue and eliminating across C4-O4 to generate products with a characteristic delta-4,5-unsaturated bond. Glycoside hydrolases, by contrast, cleave glycosidic bonds through hydrolysis, adding water across the bond. The reaction mechanisms, active site architectures, and products are categorically different. ProtNLM2 appears to have conflated the general concept of polysaccharide-degrading activity with glycoside hydrolase activity, likely because glycoside hydrolases are the most frequently annotated polysaccharide-degrading enzymes in training data. The correct molecular function term would be in the polysaccharide lyase activity branch, not the hydrolase branch. |
| A0A8B8WEG2 | Balaenoptera musculus | No predictions | |||||
| Q7VZI5 | Bordetella pertussis | No predictions | |||||
| E1BL04 | Bos taurus |
GO:0030036 actin cytoskeleton organization |
GO_BP | LSP | 2 | ProtNLM2 predicted GO:0030036 (actin cytoskeleton organization), which is the direct parent of the existing IBA annotation GO:0007015 (actin filament organization). XIRP2 organizes actin filaments specifically within the sarcomere via its 26 Xin repeats that bind F-actin, and mouse knockout studies demonstrate disrupted actin filament architecture in cardiomyocytes. The existing annotation at GO:0007015 is more informative because XIRP2 acts directly on actin filaments rather than the broader actin cytoskeleton (which includes non-filament structures such as the Arp2/3 branched network). The ai-review itself marks GO:0030036 as MARK_AS_OVER_ANNOTATED relative to GO:0007015. The prediction is biologically correct but adds no precision beyond what IBA phylogenetic inference already captures. |
|
| E1BL04 | Bos taurus |
GO:0030054 cell junction |
GO_CC | LSP | 2 | ProtNLM2 predicted GO:0030054 (cell junction), which is a broad ancestor of multiple more specific GOA annotations: GO:0005925 (focal adhesion, IBA/IEA), GO:0005911 (cell-cell junction, IEA), and GO:0070161 (anchoring junction, IEA). XIRP2 localizes to intercalated discs in cardiomyocytes -- specialized cell-cell junctions containing adherens junctions -- and colocalizes with focal adhesions (or their muscle-equivalent costameres) in non-muscle overexpression assays. The UniProt subcellular location annotation already lists "Cell junction" (ARBA), and InterPro2GO maps the Xin repeat domain to this same broad term. The prediction is correct but at the least informative level of the GO hierarchy; the more specific terms already in GOA (focal adhesion, cell-cell junction, anchoring junction) better capture XIRP2 biology. The ai-review marks GO:0030054 as MARK_AS_OVER_ANNOTATED. |
|
| E1BL04 | Bos taurus |
GO:0003779 actin binding |
GO_MF | LSP | 2 | ProtNLM2 predicted GO:0003779 (actin binding), the direct parent of the existing IBA and IEA annotation GO:0051015 (actin filament binding). XIRP2 contains 26 Xin repeats (PROSITE) / 18 Pfam Xin domains that specifically bind F-actin (filamentous actin), not monomeric G-actin. The distinction matters: GO:0003779 encompasses both G-actin and F-actin binding, while GO:0051015 correctly restricts to the filamentous form that the Xin repeat domain engages. The ai-review explicitly marks GO:0003779 as MARK_AS_OVER_ANNOTATED and identifies GO:0051015 as the core molecular function. Combined IEA methods (GO_REF:0000120) already assign GO:0003779 via InterPro, so this prediction recapitulates an existing automated annotation at a less informative level than the best available term. |
|
|
A0A061AL94
mcm-4 |
Caenorhabditis elegans |
GO:0006367 transcription initiation at RNA polymerase II promoter |
GO_BP | NPI | 0 | FREQUENCY_BIAS | MCM-4 is a subunit of the MCM2-7 replicative DNA helicase complex and has no known role in transcription initiation at RNA polymerase II promoters. The protein's sole domain in this 74 AA fragment is the winged-helix domain WHD_MCM4 (PF21128), which is structurally related to winged-helix domains found in some transcription factors. This structural similarity likely caused ProtNLM2 to predict a transcription-related function. However, the MCM4 WHD is specifically involved in DNA binding during replication origin licensing and replication fork progression, not in transcription. All established functions of MCM-4 -- DNA helicase activity, DNA replication initiation, and participation in the MCM complex -- are in the DNA replication pathway, not the transcription pathway. No literature or ortholog evidence supports a direct role for any MCM4 subunit in RNA polymerase II transcription initiation. |
|
A0A061AL94
mcm-4 |
Caenorhabditis elegans |
GO:0005634 nucleus |
GO_CC | COR | 2 | Nuclear localization is well-established for MCM complex subunits across eukaryotes, and this prediction is biologically correct. The MCM2-7 complex, of which MCM-4 is a constitutive subunit, is loaded onto chromatin at replication origins in the nucleus during late mitosis and G1 phase and operates at replication forks during S phase. In C. elegans, MCM-4::mCherry fusion proteins have been observed associated with chromosomes in live-cell imaging, directly confirming nuclear localization. The AI review of this protein independently proposed nucleus (GO:0005634) as a NEW annotation with ISS evidence. Since this accession has no existing GOA annotations, this ProtNLM2 prediction represents a correct novel prediction that is consistent with both the known biology of MCM helicase subunits and direct experimental observations in C. elegans. |
|
| A0A4W3GVU1 | Callorhinchus milii |
GO:0005634 nucleus |
GO_CC | UNC | 1 | ProtNLM2 predicted subcellular location 'Nucleus', mapped to GO:0005634 (nucleus). This is a subcellular location prediction rather than a direct GO term prediction. |
|
|
A0A8I3PI07
CNNM4 |
Canis lupus familiaris |
GO:0005886 plasma membrane |
GO_CC | LSP | 2 | GO:0005886 (plasma membrane) is already present as an existing GOA annotation for CNNM4 (both IBA via GO_Central and IEA via Ensembl/PANTHER). The prediction is correct -- CNNM4 is an integral plasma membrane protein with a signal peptide and four transmembrane helices in the CNNM domain (residues 178-358). However, this term is less precise than the most informative cellular component annotation already in GOA: basolateral plasma membrane (GO:0016323), which was transferred from the mouse ortholog Q69ZF7 via Ensembl Compara. The basolateral localization is biologically critical because CNNM4 mediates vectorial Mg2+ efflux from the cytoplasm into the interstitial space at the basolateral surface of intestinal epithelial cells. This polarized localization is essential for transcellular magnesium absorption -- apical entry via TRPM6/TRPM7 channels followed by basolateral exit via CNNM4. The generic plasma membrane term fails to capture this functionally important membrane domain specificity. The prediction also qualifies as CNN (correct but not novel) since GO:0005886 is already annotated in GOA, but LSP is the more informative assessment given the availability of the more specific GO:0016323 annotation. |
|
|
A0A8I3PI07
CNNM4 |
Canis lupus familiaris |
GO:0022857 transmembrane transporter activity |
GO_MF | LSP | 2 | FREQUENCY_BIAS | GO:0022857 (transmembrane transporter activity) is already present as an existing GOA annotation for CNNM4 (IEA via PANTHER/UniRule combined annotation, GO_REF:0000120), and was marked KEEP_AS_NON_CORE in the curated review precisely because more specific molecular function annotations exist. The protein's core molecular function is magnesium ion transmembrane transporter activity (GO:0015095), established by IBA and IEA annotations and confirmed by the curated review as the primary function. CNNM4 mediates Mg2+ efflux regulated by intracellular Mg2+-ATP binding to its CBS domain pair. GO:0022857 is an ancestor of GO:0015095 in the GO hierarchy, so while technically correct, it adds no functional information beyond what a sequence-based prediction of "this is some kind of transporter" would provide. The prediction fails to distinguish CNNM4 from any of the hundreds of other transmembrane transporters encoded in mammalian genomes, and does not capture the magnesium specificity, efflux directionality, or ATP-sensing regulatory mechanism that define this protein. |
|
A0A8I3PI07
CNNM4 |
Canis lupus familiaris |
GO:0006811 monoatomic ion transport |
GO_BP | LSP | 2 | FREQUENCY_BIAS | GO:0006811 (monoatomic ion transport) is a broad biological process term that encompasses many specific ion transport processes. CNNM4 has multiple more specific BP annotations already in GOA: magnesium ion transport (GO:0015693, IBA and IEA), magnesium ion transmembrane transport (GO:1903830, IEA via logical inference), and magnesium ion homeostasis (GO:0010960, IBA and IEA). The task description notes this as an EXACT match to GO:0035725 (sodium ion transmembrane transport), but the curated review marks the sodium transport annotations as UNDECIDED because the Na+ transport activity of CNNM4 is less well established than its primary Mg2+ efflux function -- the sodium transport may reflect a coupled or secondary mechanism rather than an independent transport activity. Regardless, GO:0006811 is a parent term of all these more specific processes and fails to capture the defining biology of CNNM4: its role as a magnesium efflux transporter critical for systemic Mg2+ homeostasis, whose loss of function causes Jalili syndrome (cone-rod dystrophy with amelogenesis imperfecta) in humans. The prediction is uninformatively generic. |
| Q7NUH2 | Chromobacterium violaceum | No predictions | |||||
| A0A2I0M3K7 | Columba livia |
GO:0106029 tRNA pseudouridine synthase activity |
GO_MF | CNN | 2 | ProtNLM2 predicted GO:0106029 (tRNA pseudouridine synthase activity), which is a child term of the existing GOA annotation GO:0009982 (pseudouridine synthase activity). The prediction is biologically correct: TRUB2 belongs to the TruB family of pseudouridine synthases and specifically catalyzes uridine-to-pseudouridine isomerization in tRNA substrates. In mammals, TRUB2 acts as a mitochondrial tRNA Psi55 synthase, modifying the conserved U55 position in the TPC loop of select mitochondrial tRNAs, and the Columba livia ortholog contains the conserved TruB N-terminal domain (PF01509) and is classified under InterPro family IPR039048 (Trub2), supporting functional equivalence. However, this is scored CNN (correct but not novel) rather than COR because the tRNA substrate specificity is already implicit in the InterPro-derived annotation: IPR039048 specifically identifies the protein as Trub2 (a known tRNA pseudouridine synthase), and the existing GO:0009982 annotation was assigned through that same InterPro mapping. The ProtNLM2 prediction thus refines granularity but does not provide genuinely new functional insight beyond what domain-based inference already established. Notably, an even more specific term GO:0160148 (tRNA pseudouridine(55) synthase activity) would better reflect TRUB2's known positional specificity at U55 in the TPC loop. |
|
|
A0A8C2TBA7
PAM |
Coturnix japonica |
GO:0004598 peptidylamidoglycolate lyase activity |
GO_MF | CNN | 2 | GO:0004598 (peptidylamidoglycolate lyase activity) corresponds to the enzymatic activity of the C-terminal PAL domain of PAM, which cleaves the peptidyl-alpha-hydroxyglycine intermediate produced by the PHM domain to yield the mature alpha-amidated peptide and glyoxylate (EC:4.3.2.5). This prediction is biologically correct: PAM is a bifunctional enzyme and the PAL lyase activity is one of its two core catalytic functions. However, GO:0004598 is already directly annotated in GOA via EC number mapping (GO_REF:0000003, with/from EC:4.3.2.5), making this a correct but not novel prediction. The automated overlap analysis flagged this as matching the parent term GO:0003824 (catalytic activity), but the exact term GO:0004598 is itself present in GOA. The UniProt entry explicitly lists EC=4.3.2.5 and documents the PAL reaction. |
|
|
A0A8C2TBA7
PAM |
Coturnix japonica |
GO:0031418 L-ascorbic acid binding |
GO_MF | COR | 2 | GO:0031418 (L-ascorbic acid binding) is a biologically correct novel prediction for PAM. The N-terminal PHM (peptidylglycine alpha-hydroxylating monooxygenase) domain is a copper-dependent monooxygenase that requires L-ascorbate as an obligate electron donor for its catalytic mechanism. The UniProt catalytic activity record explicitly shows "2 L-ascorbate" as a substrate in the PHM reaction, and the deep research confirms that "reduced ascorbate as an electron donor" is one of three essential cofactors, with "one mole of ascorbate consumed per mole of amidated product formed." While GOA contains GO:0016715 (oxidoreductase activity, acting on paired donors, with reduced ascorbate as one donor), which implicitly references ascorbate in the reaction mechanism, no explicit ascorbate binding term is present in the curated annotations. GO:0031418 captures a genuine molecular interaction -- the PHM domain must physically bind L-ascorbate to accept electrons for the copper center reduction -- that is not redundant with existing annotations. The InterPro domain signatures (Cu2_ascorb_mOase_N, Cu2_ascorb_mOase_CS-1/2) further confirm ascorbate dependence as a defining feature of this enzyme family. This represents a meaningful addition to the functional annotation of PAM. |
|
| A0A1S3BTE3 | Cucumis melo | No predictions | |||||
|
A0A8M9QG43
dnajc6 |
Danio rerio |
GO:0016311 dephosphorylation |
GO_BP | NPI | 0 | FREQUENCY_BIAS | Auxilin/DNAJC6 contains a PTEN-like phosphatase domain (residues 109-276), which ProtNLM2 likely used to predict involvement in dephosphorylation. However, the phosphatase domain of auxilin has only "probable" phosphatase activity per UniProt characterization. Its experimentally established role is phosphoinositide binding for membrane targeting during clathrin-mediated endocytosis, not catalytic dephosphorylation of substrates. The curated ai-review independently marked the related GOA annotation for phosphoprotein phosphatase activity (GO:0004721) as MARK_AS_OVER_ANNOTATED and hydrolase activity (GO:0016787) as REMOVE, noting that these sequence- feature-based annotations overstate the functional evidence. The core molecular function of auxilin is as a J-domain co-chaperone that recruits and stimulates HSC70 ATPase activity to disassemble clathrin coats from newly formed vesicles. Loss-of-function phenotypes in human (PARK19 Parkinson's disease) and zebrafish (impaired Notch signaling) are attributable to defective clathrin uncoating, not loss of phosphatase activity. This prediction reflects the model's reliance on the PTEN-like domain fold without accounting for the divergent functional role of this domain in the auxilin protein context. |
| Q9RSY6 | Deinococcus radiodurans |
GO:0003676 nucleic acid binding |
GO_MF | LSP | 2 | Less precise than the existing curated annotation. ProtNLM2 predicted GO:0003676 (nucleic acid binding), which is a high-level ancestor of GO:0003729 (mRNA binding) already present in GOA via both IBA (GO_REF:0000033, inferred from E. coli RpsA P0AG67) and IEA (GO_REF:0000117, ARBA). Ribosomal protein bS1 specifically binds mRNA 5-prime UTRs through its five tandem S1/OB-fold domains (residues 122-539) to recruit messages to the 30S subunit for translation initiation -- this is a defined and well-characterized molecular activity, not generic nucleic acid binding. The ai-review itself marked the existing GO:0003676 IEA annotation (from InterPro2GO mapping of the S1 domain IPR003029) as MARK_AS_OVER_ANNOTATED because GO:0003729 already captures the function at higher specificity. The ProtNLM2 prediction thus recapitulates an annotation already flagged as uninformatively broad, adding no biological insight beyond what the curated mRNA binding annotation provides. |
|
| Q9RSY6 | Deinococcus radiodurans |
GO:0005840 ribosome |
GO_CC | LSP | 2 | Less precise than the existing curated annotation. ProtNLM2 predicted GO:0005840 (ribosome), which is a direct parent of GO:0022627 (cytosolic small ribosomal subunit) already annotated via IBA (GO_REF:0000033). As a canonical bacterial ribosomal protein, bS1 is specifically a component of the 30S (small) ribosomal subunit -- not the 50S large subunit or the assembled 70S ribosome in general. D. radiodurans bS1 belongs to the bacterial ribosomal protein bS1 family (COG0539) and its name explicitly identifies it as the small subunit protein. The prediction correctly places the protein in a ribosomal context but fails to resolve which ribosomal subunit it occupies, information that is both biologically important (bS1 functions exclusively in the 30S subunit during mRNA recruitment) and already captured by the existing IBA annotation to GO:0022627. |
|
| A0A6I8W8A2 | Drosophila pseudoobscura pseudoobscura |
GO:0016874 ligase activity |
GO_MF | NPI | 0 | FREQUENCY_BIAS | GO:0016874 (ligase activity) is incorrect for this protein. The prediction appears driven by the automated RefSeq protein name "Probable E3 ubiquitin-protein ligase HERC3 isoform X3" rather than the actual domain content. HERC family E3 ligases require a C-terminal HECT (Homologous to E6-AP Carboxyl Terminus) domain to catalyze the transfer of ubiquitin from an E2 conjugating enzyme to substrate proteins. This 169 AA protein is far too short to contain a HECT domain (typically 350+ AA) and its entire domain architecture consists exclusively of two RCC1 repeats (positions 32-87 and 88-142), as confirmed by PROSITE, Pfam (PF00415 RCC1, PF13540 RCC1_2), InterPro (IPR000408), and Gene3D (2.130.10.30). The RCC1 repeat region in HERC proteins functions in substrate recognition and protein-protein interactions, not in catalytic ubiquitin ligation. This isoform X3 likely represents a truncated splice variant retaining only the N-terminal substrate-binding region of the full-length HERC3 ortholog. An appropriate molecular function annotation for this fragment would be in the protein binding domain (e.g., contributing to substrate recognition in a complex), not ligase activity. The protein has no curated GOA annotations and is unreviewed in UniProt (PE 4, predicted), further indicating that no experimental or curated evidence supports ligase activity for this specific gene product. |
| B4MAQ2 | Drosophila virilis |
GO:0005737 cytoplasm |
GO_CC | UNC | 1 | ProtNLM2 predicted subcellular location 'Cytoplasm', mapped to GO:0005737 (cytoplasm). This is a subcellular location prediction rather than a direct GO term prediction. |
|
|
A0A8C5FPT8
tbc1d14 |
Gadus morhua |
GO:0005776 autophagosome |
GO_CC | NPI | 0 | ProtNLM2 predicted GO:0005776 (autophagosome) as a more specific replacement for the existing GOA annotation GO:0005773 (vacuole). This prediction is incorrect. TBC1D14 is a negative regulator of macroautophagy that acts by controlling ATG9 vesicle trafficking from recycling endosomes, but it does not localize to autophagosomes themselves. Detailed studies of mammalian TBC1D14 (Lamb et al. 2016) demonstrate localization to RAB11-positive recycling endosomes, the Golgi complex, and tubulo-vesicular transport intermediates between these compartments. When TBC1D14 is overexpressed, it causes tubulation of recycling endosomes and sequesters ULK1 and ATG9 away from autophagosome formation sites, thereby inhibiting autophagy -- but TBC1D14 itself remains on the endosomal compartment, not on the autophagosome. The existing vacuole annotation (GO:0005773) was already flagged as MARK_AS_OVER_ANNOTATED in the curated review because no evidence supports vacuolar or lysosomal localization. The ProtNLM2 prediction of autophagosome appears to conflate functional involvement in autophagy regulation (a biological process) with physical residence at the autophagosome (a cellular component), a category error. The correct cellular component annotations for TBC1D14 would be recycling endosome (GO:0055037) and Golgi apparatus (GO:0005794), neither of which was predicted. |
|
| S0EDH7 | Gibberella fujikuroi |
GO:0006468 protein phosphorylation |
GO_BP | COR | 2 | Protein phosphorylation is the canonical biological process catalyzed by protein kinases. S0EDH7 contains a kinase-like domain fold confirmed by three independent domain classification methods: InterPro IPR011009 (Kinase-like domain superfamily), Gene3D 1.10.510.10 (Transferase/Phosphotransferase domain 1), and SUPFAM SSF56112 (Protein kinase-like). There are no existing GOA annotations for this protein, making this a genuinely novel prediction. While no experimental evidence exists (PE level 4), the convergent domain evidence strongly supports protein kinase activity. In Fusarium fujikuroi, protein kinases are central to signaling pathways regulating growth, secondary metabolism, and pathogenicity, though S0EDH7 itself has not been assigned to any specific kinase subfamily or signaling module. Assessed as COR (correct novel) because the domain architecture robustly supports this function despite the absence of direct experimental validation. |
|
| S0EDH7 | Gibberella fujikuroi |
GO:0005524 ATP binding |
GO_MF | COR | 2 | ATP binding is the essential molecular function underpinning protein kinase catalysis, as protein kinases use ATP as the phosphoryl group donor for substrate phosphorylation. The Gene3D classification of S0EDH7 as containing a Transferase(Phosphotransferase) domain (1.10.510.10) directly implies ATP-dependent phosphotransferase activity. The SUPFAM classification (SSF56112, Protein kinase-like) and InterPro (IPR011009, Kinase-like domain superfamily) further corroborate a fold architecture that accommodates ATP binding. No GOA annotations exist for this protein, so this is a novel prediction. The prediction is biologically coherent with the protein phosphorylation prediction (GO:0006468) -- together they describe the expected enzymatic mechanism of a protein kinase. Assessed as COR (correct novel) based on strong convergent structural domain evidence for kinase-like fold and phosphotransferase activity. |
|
| A0A2I4G8T1 | Juglans regia | No predictions | |||||
| A0A2K5UJ34 | Macaca fascicularis |
GO:0061371 determination of heart left/right asymmetry |
GO_BP | COR | 2 | ProtNLM2 predicted GO:0061371 (determination of heart left/right asymmetry), a biological process term with no overlap in the existing GOA annotations for this protein (which contain only GO:0060271 cilium assembly and GO:0032474 otolith morphogenesis, both IEA/TreeGrafter). This prediction is assessed as correct and novel based on converging evidence. First, the zebrafish ortholog of TTC39C (Q1LXE6) has direct experimental evidence (IMP) for involvement in determination of heart left/right asymmetry, establishing that this function is genuinely associated with TTC39C orthologs. Second, heart left/right asymmetry determination is a cilium-dependent process in vertebrate development: motile cilia at the embryonic node generate leftward fluid flow that breaks bilateral symmetry and initiates the Nodal signaling cascade. Third, TTC39C has been experimentally confirmed to localize to cilia in C. elegans sensory neurons (Pir et al. 2024, Ciliogenics study), and TPR domain proteins are well-established ciliary scaffolds. Unlike otolith morphogenesis (which is taxonomically inappropriate for a primate), heart left/right asymmetry determination via nodal cilia is a conserved developmental mechanism in mammals, making this prediction biologically appropriate for Macaca fascicularis. The prediction adds a specific cilium-dependent developmental outcome beyond the generic cilium assembly term already in GOA, representing genuinely informative functional annotation. |
|
| A0A804UIX9 | Zea mays | No predictions | |||||
| A0A8B6BFL6 | Mytilus galloprovincialis |
GO:0003677 DNA binding |
GO_MF | LSP | 2 | DNA binding is technically correct for a reverse transcriptase domain-containing protein from a DIRS1-type retrotransposon, since the protein must interact with DNA during reverse transcription (cDNA synthesis) and integration. However, GO:0003677 is a very broad molecular function term that fails to capture the actual enzymatic activity. The primary molecular function of this protein is RNA-directed DNA polymerase activity (GO:0003964), which is far more informative. Additionally, DIRS1 elements encode a tyrosine recombinase domain for integration that also binds DNA, but again, the specific catalytic function is more informative than generic DNA binding. Assessed as LSP because the prediction is correct at a high level but substantially less precise than what domain architecture alone would support. |
|
| A0A8B6BFL6 | Mytilus galloprovincialis |
GO:0006310 DNA recombination |
GO_BP | COR | 2 | DNA recombination is a correct novel prediction for a DIRS1-type retrotransposon protein. Unlike LINE retrotransposons that use target-primed reverse transcription (TPRT) for integration, DIRS1 elements integrate into the host genome via tyrosine recombinase-mediated site-specific recombination. The CDD domain RNase_HI_RT_DIRS1 (cd09275) specifically identifies this protein as belonging to the DIRS1 class, whose integration mechanism is fundamentally recombination-based. DIRS1 elements produce circular DNA intermediates that are then integrated through recombination at their inverted terminal repeats. This term is absent from GOA for this protein and represents a biologically accurate prediction supported by the known mechanism of DIRS1 element propagation. |
|
| A0A8B6BFL6 | Mytilus galloprovincialis |
GO:0015074 DNA integration |
GO_BP | COR | 2 | DNA integration is a correct novel prediction for a retrotransposon-encoded reverse transcriptase. Retrotransposons replicate via a copy-and-paste mechanism in which the element is transcribed to RNA, reverse-transcribed to DNA, and then integrated into new genomic loci. For DIRS1-type elements (as indicated by the RNase_HI_RT_DIRS1 CDD domain cd09275), integration proceeds through tyrosine recombinase-mediated insertion of circular DNA intermediates. The protein's domain architecture (RT domain plus DIRS1-associated RNase H) directly supports involvement in the retrotransposition cycle that culminates in genomic integration. This is a core biological process for any active retrotransposon and is not present in GOA for this uncharacterized protein from M. galloprovincialis, making it a genuinely informative novel prediction. |
|
| A0A8B6GS20 | Mytilus galloprovincialis |
GO:0004438 phosphatidylinositol-3-phosphate phosphatase activity |
GO_MF | NPI | 0 | PARALOG_OVERANNOTATION | ProtNLM2 predicted GO:0004438 (phosphatidylinositol-3-phosphate phosphatase activity) for MTMR9, but this is a catalytic activity that MTMR9 cannot perform. MTMR9 is a well-characterized catalytically inactive pseudophosphatase within the myotubularin subfamily. It retains the overall myotubularin phosphatase domain fold (Pfam PF06602, Myotub-related; PROSITE PS51339, PPASE_MYOTUBULARIN) but lacks the critical catalytic cysteine in the conserved CX5R motif required for phosphoinositide dephosphorylation. The existing GOA annotations reflect this biology correctly: MTMR9 is annotated with protein phosphatase binding (GO:0019903), capturing its role as a scaffold that heterodimerizes with catalytically active family members (MTMR6, MTMR7, MTMR8), and its involvement in phosphatidylinositol dephosphorylation (GO:0046856) was flagged in the ai-review as requiring modification to the regulatory term GO:0060304 (regulation of phosphatidylinositol dephosphorylation). The ProtNLM2 error is a classic paralog overannotation (Type 6): the sequence-based model recognized the conserved myotubularin domain architecture and predicted the catalytic activity of the active subfamily members (MTMR1-4, MTMR6-8) without detecting the degenerate active site that distinguishes the pseudophosphatase branch (MTMR5, MTMR9, MTMR10-13). This is precisely the kind of subfamily-level functional divergence that sequence similarity methods struggle to capture, as the overall domain architecture is preserved despite loss of catalytic competence. PI(3)P phosphatase activity is the canonical substrate specificity of the active myotubularins, making it a plausible but biologically incorrect prediction for this catalytically dead family member. |
| B8BAB0 | Oryza sativa subsp. indica |
GO:0010152 pollen maturation |
GO_BP | UNC | 1 | GO:0010152 (pollen maturation) is a biological process prediction for a BURP domain-containing protein with no existing GOA annotations and no direct experimental characterization (UniProt PE level 4). The prediction has some biological plausibility because the BURP protein family includes members with roles in reproductive development: BNM2 (the founding BNM2-like subfamily member) is linked to pollen grain embryogenesis, and OsRAFTIN1, a rice BURP protein, is specifically expressed in anthers during microspore development and is required for male fertility. However, B8BAB0 is classified by PANTHER (PTHR31458:SF2) as a PG1beta-like BURP protein, not a BNM2-like protein. The PG1beta-like subfamily has well-characterized members in rice (OsBURP14, OsBURP16) that function as non-catalytic beta subunits of polygalacturonase isozyme 1, participating in cell wall pectin degradation under ethylene/ABA stress signaling rather than pollen-specific processes. Furthermore, B8BAB0 contains an N-terminal signal peptide (aa 1-21) and a C-terminal BURP domain (aa 384-595) consistent with a secreted protein involved in extracellular matrix or cell wall modification, but its variable internal region and overall domain architecture align with PG-associated function (IPR051897, PG-associated_BURP) rather than reproductive-specific roles. The prediction may reflect ProtNLM2 conflating BURP family-level associations with reproductive biology (driven by BNM2-like and RAFTIN members in the training data) and applying them indiscriminately across the family. Without expression data, mutant phenotypes, or protein interaction studies for B8BAB0, the prediction cannot be confirmed or refuted. Pollen maturation does involve cell wall remodeling processes that could theoretically engage PG-associated proteins, but this indirect reasoning is insufficient for a confident assignment. |
|
| Q6YYC5 | Oryza sativa subsp. japonica |
GO:0070534 protein K63-linked ubiquitination |
GO_BP | UNC | 1 | ProtNLM2 predicted GO:0070534 (protein K63-linked ubiquitination) as a more specific refinement of the existing GOA annotation GO:0016567 (protein ubiquitination). The prediction is biologically plausible but uncertain. The Arabidopsis ortholog RGLG2 (AT3G01650/Q9LY87), one of the source genes for the IBA transfer to Q6YYC5, has been experimentally shown to catalyze K63-linked polyubiquitin chain formation. However, ubiquitin chain-type linkage specificity is primarily determined by the E2 conjugating enzyme partner, not the E3 ligase itself. RGLG2 catalyzes K63-linked chains when paired with UBC35 (a group-III E2), but may produce different linkage types with other E2s. No experimental data exist for Q6YYC5, and the specific E2 partner(s) of this uncharacterized rice RGLG protein are unknown. Furthermore, while Q6YYC5 is classified in the RGLG family (PTHR45751:SF16, RGLG4), not all RGLG family members necessarily share K63 linkage specificity -- the rice RGLG family includes members (OsRGLG5, OsRGLG6) that target substrates for 26S proteasomal degradation, which typically involves K48-linked chains. The prediction cannot be confirmed or refuted without biochemical characterization of Q6YYC5 with its cognate E2 enzyme(s). |
|
| Q6YYC5 | Oryza sativa subsp. japonica |
GO:0061630 ubiquitin protein ligase activity |
GO_MF | CNN | 2 | ProtNLM2 predicted GO:0061630 (ubiquitin protein ligase activity) as a more specific refinement of the existing GOA annotation GO:0004842 (ubiquitin-protein transferase activity). This is correct: GO:0004842 encompasses both E2 conjugating enzymes and E3 ligases, while GO:0061630 specifically denotes E3 ligase activity. Q6YYC5 contains a canonical C-terminal RING finger zinc-binding domain (IPR001841, PROSITE PS50089, residues 356-389) which is the hallmark catalytic domain of RING-type E3 ubiquitin ligases. PANTHER classifies it as PTHR45751:SF16 (E3 ubiquitin-protein ligase RGLG4), and related rice RGLG proteins (OsRGLG5, OsRGLG6) as well as Arabidopsis orthologs (RGLG1/RGLG2) have confirmed E3 ligase activity. However, this refinement is assessed as CNN (correct but not novel) rather than COR because the E3 vs E2 distinction is already apparent from domain architecture alone -- the RING domain is universally recognized as an E3 ligase signature -- and the main ai-review independently proposed the identical MODIFY action from GO:0004842 to GO:0061630 based on this same reasoning. The prediction confirms existing domain-based inference rather than providing genuinely novel functional insight. |
|
| A0A2R9CAF4 | Pan paniscus |
GO:0008509 monoatomic anion transmembrane transporter activity |
GO_MF | LSP | 2 | GO:0008509 (monoatomic anion transmembrane transporter activity) is already present as an existing GOA annotation for this protein. As a broad parent term, it correctly captures that SLC26A11 transports monoatomic anions (sulfate, chloride, and others), but it is far less informative than the specific molecular functions already annotated. The protein's core activity is secondary active sulfate transmembrane transporter activity (GO:0008271), demonstrated by reconstitution studies showing proton-coupled sulfate/chloride exchange with a KM of approximately 40 uM for sulfate. It also possesses a distinct chloride channel activity (GO:0005254) confirmed by electrophysiology. The task description notes an EXACT match to GO:0140900 (chloride:bicarbonate antiporter activity), but the curated review flags GO:0140900 for modification because recent biochemical reconstitution data show bicarbonate has minimal competition for the SLC26A11 substrate binding site -- the actual exchange mechanism is proton-coupled sulfate/chloride antiport, not chloride/bicarbonate exchange. ProtNLM2 failed to predict the more specific and biologically accurate transport activities that define this protein's function. |
|
| A0A2R9CAF4 | Pan paniscus |
GO:0016020 membrane |
GO_CC | LSP | 2 | FREQUENCY_BIAS | GO:0016020 (membrane) is trivially correct for SLC26A11, which has 10-14 transmembrane helices and is unambiguously an integral membrane protein. However, this prediction is uninformative and already marked as MARK_AS_OVER_ANNOTATED in the curated review. The biologically meaningful localization is the lysosomal membrane (GO:0005765), where SLC26A11 functions as the primary sulfate exporter using the lysosomal proton gradient. Confocal microscopy with Lamp1 co-staining shows Manders coefficients of 0.45-0.50 for lysosomal overlap across multiple mammalian cell types (HEK293T, COS1, CHO, renal intercalated cells). By contrast, overlap with ER markers is minimal (0.09-0.12). The generic membrane term fails to distinguish this specific lysosomal residence from any other membrane protein. This type of over-generic cellular component prediction is characteristic of frequency bias, as membrane is among the most commonly assigned GO CC terms in training data. |
| A0BFB4 | Paramecium tetraurelia |
GO:0006468 protein phosphorylation |
GO_BP | CNN | 2 | ProtNLM2 predicted GO:0006468 (protein phosphorylation), the biological process of covalent addition of phosphate groups to amino acid residues in proteins. A0BFB4 contains a canonical protein kinase catalytic domain (Pfam PF00069, residues 96-348) with a conserved serine/threonine kinase active site (IPR008271) and ATP-binding site (IPR017441), and is already annotated in GOA with GO:0004674 (protein serine/threonine kinase activity) via both IBA and IEA evidence. Protein phosphorylation is the process directly enabled by protein kinase activity -- a kinase that catalyzes ATP-dependent transfer of phosphate to Ser/Thr residues is by definition involved in protein phosphorylation. The prediction is therefore correct but not novel: the biological process is logically entailed by the existing molecular function annotation. Additionally, GOA already includes GO:0005524 (ATP binding), which further corroborates the catalytic competence of this kinase. While the specific substrates and pathway context of A0BFB4 remain unknown (it belongs to a massively expanded kinome of 2606 kinases in P. tetraurelia), the general involvement in protein phosphorylation is unambiguous from its domain architecture. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0009651 response to salt stress |
GO_BP | UNC | 1 | While sHSPs can be induced by multiple abiotic stresses beyond heat, there is no direct evidence that HSP20A in P. tricornutum is specifically involved in the response to salt stress. The deep research report focuses exclusively on thermal stress roles for this protein family in diatoms, with no mention of salinity-induced expression. P. tricornutum is a marine diatom and does experience osmotic stress, but the expanded HSF regulatory network described in this organism (Huang et al. 2025, Lin et al. 2024) is characterized in the context of thermal tolerance, not salt acclimation. Some plant sHSPs are salt-inducible, but extrapolating this to a diatom HSP20 without organism-specific evidence is speculative. ProtNLM2 may have learned a general association between stress-response proteins and salt stress from plant training data, but this remains unvalidated for HSP20A. Cannot be confirmed or refuted without transcriptomic or genetic evidence under salinity stress conditions. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0051259 protein complex oligomerization |
GO_BP | COR | 2 | This is a correct novel prediction. Oligomerization is a defining and functionally essential feature of the HSP20/sHSP family. Small heat shock proteins assemble into dynamic oligomeric structures ranging from dimers to large complexes of 24 or more subunits, with dimers serving as building blocks that associate through their N-terminal and C-terminal regions to form higher-order assemblies (Sprague-Piercy et al. 2021, Gu et al. 2023). The oligomeric state is functionally significant: dimers often represent the active chaperone form, while larger oligomers may serve as inactive storage pools. HSP20A contains the conserved alpha-crystallin domain (residues 47-155) that mediates dimerization, and its variable terminal extensions regulate higher-order oligomerization. Since B7FXQ8 has no existing GOA annotations, this represents a genuine novel prediction consistent with the well-established structural biology of the sHSP family. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0006457 protein folding |
GO_BP | LSP | 2 | This prediction is broadly correct but less precise than the actual biological role. sHSPs like HSP20A do not actively fold proteins; they function as holdase chaperones that prevent irreversible aggregation of partially unfolded or misfolded proteins and maintain them in a folding-competent state (Mitra et al. 2022, Albinhassan et al. 2025). The actual refolding is carried out by ATP-dependent chaperones (HSP70, HSP100) to which sHSPs hand off their client proteins. A more precise annotation would be GO:0061077 (chaperone-mediated protein folding) or terms related to the prevention of protein aggregation, such as GO:0051085 (chaperone cofactor-dependent protein refolding) or the broader protein quality control pathway. GO:0006457 (protein folding) is a general term that encompasses de novo folding, which is not the primary role of sHSPs. The prediction captures the correct functional domain (proteostasis) but at insufficient specificity. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0009408 response to heat |
GO_BP | COR | 2 | This is a correct novel prediction and arguably the most biologically well-supported of all six predictions. HSP20A is named as a heat shock protein and belongs to the sHSP/HSP20 family, whose defining biological role is the cellular response to heat stress. In P. tricornutum specifically, recent research has established that the organism possesses an exceptionally expanded heat shock transcription factor (HSF) repertoire (69 HSF genes, 44.2% of all transcription factors) that controls thermal tolerance programs, with HSP proteins as downstream effectors (Huang et al. 2025, Lin et al. 2024). In the related marine dinoflagellate Scrippsiella trochoidea, HSP20 transcripts are strongly upregulated under heat stress (Deng et al. 2020). The UniProt entry for B7FXQ8 carries a keyword annotation for stress response, and the ARBA automated rule (ARBA00023016) supports this assignment. Since P. tricornutum thrives across temperatures from 5 to 28 degrees C in diverse marine environments, heat shock proteins are critical for its thermal adaptability. This is a high-confidence correct prediction. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0042542 response to hydrogen peroxide |
GO_BP | UNC | 1 | There is no direct evidence that HSP20A in P. tricornutum is involved in the response to hydrogen peroxide. While some sHSPs in other organisms (particularly plant and mammalian systems) have been shown to confer protection against oxidative stress, and oxidative stress and heat stress response pathways can overlap, the deep research report for this protein does not mention any oxidative stress role. The alpha-crystallin domain in vertebrate lens crystallins does protect against oxidative damage, but this function has not been demonstrated for diatom HSP20 proteins. ProtNLM2 may have learned an association between sHSPs and oxidative stress responses from well-characterized plant or mammalian training examples, but without P. tricornutum-specific evidence (e.g., transcriptomic upregulation under H2O2 treatment or genetic perturbation data), this prediction cannot be validated or refuted. |
|
| B7FXQ8 | Phaeodactylum tricornutum |
GO:0051082 unfolded protein binding |
GO_MF | COR | 2 | This is a correct novel prediction of the core molecular function of sHSPs. The alpha-crystallin domain of HSP20 family proteins mediates direct binding to partially unfolded, misfolded, or aggregation-prone client proteins through recognition of exposed hydrophobic surface regions that are normally buried in properly folded structures (Sprague-Piercy et al. 2021, Mitra et al. 2022, Gu et al. 2023). This holdase activity -- binding non-native proteins to prevent their irreversible aggregation -- is the primary molecular function of sHSPs and is mediated by the conserved alpha-crystallin domain present in HSP20A (residues 47-155, InterPro IPR002068, Pfam PF00011). The CDD annotation ACD_sHsps-like (cd06464) on this protein further confirms the presence of a functional alpha-crystallin domain competent for substrate binding. sHSPs exhibit broad, promiscuous substrate specificity for non-native proteins rather than targeting specific individual clients. Since B7FXQ8 has no existing GOA annotations, this is a genuinely novel and well-supported prediction. |
|
|
G1TUN6
UBE2L6 |
Oryctolagus cuniculus |
GO:0016740 transferase activity |
GO_MF | LSP | 2 | ProtNLM2 predicted GO:0016740 (transferase activity) for rabbit UBE2L6, a prediction that is biologically correct but substantially less precise than the existing curated annotations. UBE2L6 is an E2 ubiquitin-conjugating enzyme (EC 2.3.2.23) whose primary physiological role is as the dedicated E2 for ISG15 conjugation (ISGylation), an interferon-stimulated post-translational modification system central to innate antiviral defense. The E2 reaction mechanism is a transthioesterification -- the activated ubiquitin-like modifier (ISG15 or ubiquitin) is transferred from the E1 thioester to the conserved active-site cysteine (Cys86) of the E2 via a new thioester bond -- placing UBE2L6 squarely in the transferase catalytic class. However, GO:0016740 (transferase activity) is a very high-level term that encompasses all enzymes transferring any functional group; the GOA already contains the far more informative descendant terms GO:0019787 (ubiquitin-like protein transferase activity) and GO:0042296 (ISG15 transferase activity), both transferred via Ensembl Compara from the experimentally characterized human ortholog (O14933). GO:0016740 is an ancestor of both these terms in the GO molecular function hierarchy, so the ProtNLM2 prediction adds no novel information and is less informative than what existing automated methods have already assigned. This is scored LSP (less precise than existing annotation) rather than CNN because the prediction does not match an existing annotation at the same granularity -- it is a strict generalization that would never be annotated alongside the more specific terms under standard GO annotation practice. |
|
| C6T1A2 | Glycine max |
GO:0009788 negative regulation of abscisic acid-activated signaling pathway |
GO_BP | UNC | 1 | ProtNLM2 predicted GO:0009788 (negative regulation of abscisic acid-activated signaling pathway) for C6T1A2, a soybean C2H2-type zinc finger protein with no curated GO annotations in GOA. This prediction is biologically plausible but uncertain. The most likely basis for this prediction is sequence similarity to Arabidopsis ZFP7 (Q39266), which shares the same IPR053266 (Zinc_finger_protein_7) family membership and single-C2H2 domain architecture, and has been experimentally shown to negatively regulate ABA-activated signaling during seed germination. However, several factors limit confidence in this functional transfer: (1) the C2H2 zinc finger superfamily is one of the largest transcription factor families in plants, with members participating in diverse biological processes including cold stress, flower development, trichome initiation, and photomorphogenesis -- sharing a C2H2 domain does not predict specific pathway involvement; (2) no gene-specific experimental studies exist for C6T1A2 / Glyma17g18110.1, and the protein was only identified as part of a soybean transcription factor ORFeome cloning effort (PMID:26268547) without individual functional characterization; (3) while soybean does possess ABA signaling pathways involved in drought responses, the specific negative regulatory role of ZFP7-family members may not be conserved across the ~90 million year divergence between Arabidopsis and Glycine max, given extensive C2H2 gene family expansion and subfunctionalization in legumes. The prediction cannot be confirmed or refuted without experimental evidence such as ABA-responsive expression profiling or overexpression/knockout phenotyping in soybean. |
|
| C6T1A2 | Glycine max |
GO:0005634 nucleus |
GO_CC | CNN | 2 | ProtNLM2 predicted GO:0005634 (nucleus) for C6T1A2, a soybean C2H2-type zinc finger transcription factor. This prediction is biologically correct: C2H2-type zinc finger transcription factors are canonical nuclear proteins that must localize to the nucleus to bind DNA and regulate transcription. Numerous soybean C2H2 zinc finger proteins have been experimentally confirmed as nuclear-localized (e.g., GmZFP3, GmZF1), and nuclear localization is essentially a defining characteristic of the functional class. However, this is assessed as CNN (correct but not novel) rather than COR because the prediction provides no functional insight beyond what is already trivially derivable from the protein's domain architecture and family classification. The InterPro annotation (IPR053266, Zinc_finger_protein_7; IPR036236, Znf_C2H2_sf) and PANTHER classification (PTHR47593, ZINC FINGER PROTEIN 4-LIKE) both implicitly predict nuclear localization. The ai-review independently proposed this same annotation as a NEW ISS-level annotation based on domain architecture, confirming that the ProtNLM2 prediction is redundant with standard domain-based inference. While formally correct, a model that predicts nucleus for a zinc finger transcription factor demonstrates no discriminative power beyond family-level annotation transfer. |
|
| Q9KZ33 | Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) |
GO:0006352 DNA-templated transcription initiation |
GO_BP | LSP | 2 | ProtNLM2 predicted GO:0006352 (DNA-templated transcription initiation) for Q9KZ33/SCO7099, a predicted ECF sigma factor in S. coelicolor. While this prediction is biologically sound -- sigma factors are essential components of the bacterial transcription initiation complex, binding the RNA polymerase core enzyme and directing it to specific promoter sequences -- it is less precise than the existing GOA annotation GO:2000142 (regulation of DNA-templated transcription initiation, IEA via GO_REF:0000108). The distinction matters: sigma factors do not perform the catalytic step of transcription initiation (phosphodiester bond formation by the beta/beta-prime subunits of RNAP core); rather, they regulate which promoters are recognized and thus which genes undergo transcription initiation. GO:2000142 correctly captures this regulatory role -- the sigma factor modulates the specificity of the initiation event rather than being the initiation machinery per se. Additionally, Q9KZ33 carries a second GOA annotation for GO:0016987 (sigma factor activity, IBA via PANTHER PTN001249270), which is the molecular function from which GO:2000142 is logically derived. The predicted GO:0006352 is the process being regulated, not the regulatory function itself, making it a less precise annotation for an ECF sigma factor. The prediction is concordant with the known biology (sigma factors are intimately involved in transcription initiation) but adds no new information beyond what the existing, more precise GOA annotations already convey. |
|
|
Q9L243
SCO2678 |
Streptomyces coelicolor |
GO:0008253 5'-nucleotidase activity |
GO_MF | COR | 2 | ProtNLM2 predicted GO:0008253 (5'-nucleotidase activity), the hydrolysis of a 5'-ribonucleotide or 5'-deoxyribonucleotide to a ribonucleoside or deoxyribonucleoside and orthophosphate. This prediction is well-supported by multiple independent lines of evidence. SCO2678 contains a HAD_SAK_2 domain (PF18143), placing it in the haloacid dehalogenase (HAD) superfamily, which catalyzes phosphate ester hydrolysis via a conserved nucleophilic aspartate mechanism. The protein is classified in eggNOG COG1877 (5'-nucleotidase/2',3'-cyclic phosphodiesterase and related esterases), which directly supports assignment of 5'-nucleotidase activity within the broader HAD phosphatase family. A characterized S. coelicolor homolog, SCO4152, is a PhoP-regulated extracellular 5'-nucleotidase involved in phosphate scavenging, providing direct functional precedent in this organism. S. coelicolor lacks organic phosphate transporters (uhp-type systems), necessitating extracellular dephosphorylation of nucleotides before phosphate uptake via PstSCAB or PitH transporters, which provides strong biological rationale for secreted nucleotidase activity. The UniProt entry designates SCO2678 as a secreted protein, consistent with an extracellular phosphatase role. GOA contains no curated annotations for this protein, so this prediction is genuinely novel. The independent AI review of this gene also proposed GO:0008253 as a new annotation based on the same convergent evidence. While the exact substrate specificity of SCO2678 has not been experimentally determined and HAD superfamily members can exhibit substrate promiscuity across nucleotides, sugar phosphates, and other phosphomonoesters, the COG1877 classification specifically favors 5'-nucleotidase over other HAD activities. |
|
|
Q9L243
SCO2678 |
Streptomyces coelicolor |
GO:0009264 deoxyribonucleotide catabolic process |
GO_BP | UNC | 1 | ProtNLM2 predicted GO:0009264 (deoxyribonucleotide catabolic process), the chemical reactions resulting in the breakdown of deoxyribonucleotides. This prediction is plausible but insufficiently supported. If SCO2678 indeed possesses 5'-nucleotidase activity (as predicted above), hydrolysis of deoxyribonucleotides would fall within the scope of that activity, and participation in deoxyribonucleotide catabolism would logically follow. However, the prediction is problematic for two reasons. First, it is overly specific: no available evidence distinguishes deoxyribonucleotide substrates from ribonucleotide substrates for this enzyme. HAD superfamily members, including characterized 5'-nucleotidases, typically hydrolyze both ribo- and deoxyribonucleotides, and the deep research on SCO2678 consistently references broad substrate classes (nucleotides, sugar phosphates, glycerophosphodiesters) without singling out deoxyribonucleotides. Second, the biological context of S. coelicolor phosphate scavenging does not specifically implicate deoxyribonucleotide catabolism over general nucleotide catabolism -- the organism's need is for inorganic phosphate release from whatever organophosphates are available in the soil environment. A more appropriate biological process annotation would be GO:0006796 (phosphate-containing compound metabolic process) or GO:0009166 (nucleotide catabolic process), which are agnostic to the deoxy/ribo distinction. Without experimental substrate profiling demonstrating a preference for deoxyribonucleotides, this prediction cannot be confirmed or refuted. |
|
|
A0A674PKV4
gas7a |
Takifugu rubripes |
GO:0005737 cytoplasm |
GO_CC | LSP | 2 | FREQUENCY_BIAS | GO:0005737 (cytoplasm) is already present as an IEA annotation in GOA for this protein, transferred via TreeGrafter from the PANTHER GAS7 subfamily (PTHR23065:SF57). While technically correct -- GAS7a is a cytoplasmic protein that peripherally associates with membranes via its F-BAR domain -- cytoplasm is a highly generic cellular component term that conveys almost no functional information. The curated review appropriately marks this annotation as KEEP_AS_NON_CORE because the biologically meaningful localizations for gas7a are the plasma membrane (GO:0005886) and clathrin-coated pit (GO:0005905), where the crescent-shaped F-BAR dimer (residues 121-381, CDD: cd07649 F-BAR_GAS7) senses and induces membrane curvature during the invagination step of clathrin-mediated endocytosis. ProtNLM2 failed to predict any of the six more specific and informative annotations already in GOA, including the clathrin-coated pit localization, the clathrin-dependent endocytosis process, and the neuron projection morphogenesis role that is one of the best-characterized functions of the mammalian GAS7 family. This pattern of predicting only a broad parent term while missing all specific functional annotations is characteristic of frequency bias, as cytoplasm is among the most commonly assigned GO CC terms in training data. |
| A0A1S3Y076 | Nicotiana tabacum |
GO:0008033 tRNA processing |
GO_BP | LSP | 2 | ProtNLM2 predicted GO:0008033 (tRNA processing) for this PRORP enzyme. The prediction is biologically correct: A0A1S3Y076 is a proteinaceous RNase P that catalyzes endonucleolytic cleavage of 5-prime leader sequences from precursor tRNAs (EC 3.1.26.5), which is indeed a form of tRNA processing. However, the prediction is less precise than the existing GOA annotation GO:0001682 (tRNA 5'-leader removal), which is a direct child of GO:0008033 and specifically names the exact processing step carried out by RNase P. The existing IBA annotation for GO:0001682 was inferred from experimentally characterized Arabidopsis PRORP orthologs (PRORP1/AT2G16650, PRORP2/AT2G32230, PRORP3/AT4G21900) and is well supported by the conserved PPR + NYN metallonuclease domain architecture of this protein. Classified as LSP rather than CNN because the model failed to resolve the specific tRNA processing step despite the clear domain signature pointing to RNase P cleavage activity. |
|
| A2FPI7 | Trichomonas vaginalis |
GO:0003677 DNA binding |
GO_MF | COR | 2 | ProtNLM2 predicted GO:0003677 (DNA binding) for A2FPI7, a 129-amino-acid protein with no existing GOA annotations (match category NOT_IN_GOA). This prediction is assessed as correct and novel based on strong domain-level evidence. A2FPI7 contains a single KilA-N domain (PF04383/IPR017880, residues 19-124, identified by PROSITE PS51301) that spans nearly the entire protein. The KilA-N domain belongs to the KilA-N/APSES helix-turn-helix superfamily (InterPro IPR018004), which has been experimentally validated as a DNA-binding fold in two distinct biological contexts: (1) bacteriophage regulatory proteins, where KilA-N was originally characterized as mediating DNA binding for transcriptional control, and (2) fungal APSES transcription factors, whose DNA-binding domain is structurally homologous to KilA-N and has been shown to bind DNA sequence-specifically. UniProt has assigned the recommended name "KilA-N domain-containing protein" based on this domain (ECO:0000259). The T. vaginalis genome (~160 Mb) is known to harbor numerous laterally transferred genes of viral and bacterial origin, consistent with the presence of a KilA-N domain protein of likely viral ancestry. Given that DNA binding is the defining functional property of the KilA-N/APSES HTH superfamily, and the domain occupies the vast majority of this small protein (106 of 129 residues), ProtNLM2 has correctly identified the most parsimonious molecular function. No experimental data exist for this PE4-level protein, but the domain-based evidence is unambiguous. |
|
| A0A3B6GK97 | Triticum aestivum |
GO:0016298 lipase activity |
GO_MF | LSP | 2 | ProtNLM2 predicted GO:0016298 (lipase activity) for this patatin/PNPLA domain-containing protein. The prediction is biologically sound: the patatin family comprises non-specific lipid acyl hydrolases that cleave acyl-ester bonds of glycerolipids using a Ser-Asp catalytic dyad (PMID:12779324), and lipase activity is a correct descriptor at the family level. However, this protein already carries more specific IBA (phylogenetic) annotations from GO_Central: GO:0047372 monoacylglycerol lipase activity and GO:0004620 glycerophospholipase activity (GO_REF:0000033). These were assigned by manual phylogenetic inference and provide substrate-resolved specificity that the ProtNLM2 prediction lacks. Additionally, the ai-review proposed GO:0052689 (carboxylic ester hydrolase activity) as an informative unifying parent. Therefore, while correct, GO:0016298 is less precise than the existing GO:0047372 and adds no information beyond what is already captured. |
|
| A0A3B6GK97 | Triticum aestivum |
GO:0016042 lipid catabolic process |
GO_BP | CNN | 2 | ProtNLM2 predicted GO:0016042 (lipid catabolic process), which is a child of the existing GOA annotation GO:0006629 (lipid metabolic process, IEA via InterPro2GO from the PNPLA domain IPR002641). While the patatin family does participate in lipid catabolism (e.g., lipid mobilization during seed germination and storage-oil breakdown), the ai-review deliberately retained the broader GO:0006629 rather than sharpening to lipid catabolic process, because patatin-family acyl hydrolases also act in membrane phospholipid remodeling and lipid-based defense signaling - roles that are not purely catabolic. The deep-research report supports diverse patatin functions including membrane phospholipid turnover and remodeling of lipid-droplet surfaces. Furthermore, the pPLAII subfamily placement (from bioinformatics analysis) associates this protein with the defense/stress response clade rather than a dedicated degradative role. The prediction is therefore correct in that lipid catabolism is one component of patatin function, but it is not novel relative to the existing broader annotation and arguably narrows the functional scope beyond what is justified by available evidence for this uncharacterized protein. |
|
| A0A3B6NKR6 | Triticum aestivum |
GO:0016310 phosphorylation |
GO_BP | LSP | 2 | GO:0016310 (phosphorylation) is a correct but uninformative prediction for this protein. A0A3B6NKR6 contains GHMP kinase N-terminal (IPR006204) and C-terminal (IPR013750) domains together with a glucuronokinase-like domain (IPR053034), and its closest characterized ortholog is Arabidopsis GLAK2 (Q9LY82, EC 2.7.1.43). Glucuronokinases catalyze the ATP-dependent phosphorylation of D-glucuronic acid to D-glucuronate-1-phosphate, so phosphorylation is technically correct at the broadest level. However, the prediction is far too generic: GO:0016310 is a high-level biological process term that encompasses all kinase reactions and fails to capture the specific pathway context. The informative annotation would be involvement in UDP-glucuronate biosynthetic process (GO:0006065) via the myo-inositol oxidation pathway, with glucuronokinase activity (GO:0047940) as the molecular function. Additionally, GO:0016310 does not overlap with the existing GOA annotations, which include ATP binding (IEA) and cytosol localization (IBA), though it is semantically consistent with the kinase function implied by the ATP binding annotation. Scored as LSP because the prediction is correct at a coarse granularity but does not add biological insight beyond what is already implied by the GHMP kinase domain assignment. |
|
| A0A3B6RKV1 | Triticum aestivum |
GO:0010099 regulation of photomorphogenesis |
GO_BP | UNC | 1 | The closest characterized ortholog in Arabidopsis (AT5G06550, JMJ22/PKDM7D) participates in photomorphogenesis through histone demethylation at target gene loci, making this prediction biologically plausible at the ortholog level. However, no experimental evidence exists for photomorphogenesis regulation by any wheat JmjC family member. Photomorphogenesis regulatory networks differ between monocots and dicots, and wheat KDM5/JARID1 members show dynamic expression under drought stress rather than light-responsive regulation in published studies (Wang et al. 2022). The protein has PE level 3 (inferred from homology) with no direct functional characterization. This prediction likely reflects training data from the well-characterized Arabidopsis JMJ22 ortholog rather than wheat-specific evidence. |
|
| A0A3B6RKV1 | Triticum aestivum |
GO:0010476 gibberellin mediated signaling pathway |
GO_BP | UNC | 1 | Arabidopsis JMJ22/PKDM7D demethylates histones at GA biosynthesis gene loci, linking it to gibberellin-mediated signaling. Wheat JmjC gene promoters contain GA-responsive cis-elements (Wang et al. 2022; Ma et al. 2022), providing indirect support for GA pathway involvement. However, the presence of hormone-responsive promoter elements is common across many gene families and does not establish direct involvement in GA signaling. No experimental evidence demonstrates that this specific wheat protein regulates GA biosynthesis or signaling. The prediction is plausible through orthology to JMJ22 but remains unverifiable without wheat-specific functional data. |
|
| A0A3B6RKV1 | Triticum aestivum |
GO:0040029 epigenetic regulation of gene expression |
GO_BP | COR | 2 | This is a correct novel prediction. A0A3B6RKV1 contains a JmjC catalytic domain (residues 285-445) and belongs to the KDM5/JARID1 histone demethylase subfamily, which catalyzes Fe(II)- and 2-oxoglutarate-dependent oxidative removal of methyl groups from histone H3K4me1/2/3 marks. Histone demethylation is by definition a mechanism of epigenetic gene expression regulation. Phylogenetic analysis across 21 plant species confirms that KDM5/JARID subfamily members function as H3K4 demethylases (Ma et al. 2022), and 18 of 24 wheat JmjC family members localize to the nucleus consistent with chromatin-associated function (Wang et al. 2022). The ai-review independently lists this term as a core function. This GO term was not present in the existing GOA annotations, making it a genuinely novel and correct prediction that follows directly from the established enzymatic activity of the protein family. |
|
| A0A3B6RKV1 | Triticum aestivum |
GO:0010114 response to red light |
GO_BP | UNC | 1 | Red light response is mechanistically linked to photomorphogenesis and phytochrome signaling, and some Arabidopsis JmjC proteins participate in light-mediated developmental pathways. However, this is a more specific prediction than regulation of photomorphogenesis, and the evidence supporting it is weaker. No published study demonstrates response to red light for any wheat KDM5/JARID1 member, nor has the Arabidopsis ortholog JMJ22 been specifically characterized as red-light responsive (its photomorphogenesis role involves broader light signaling). The deep research report for A0A3B6RKV1 does not mention red light or phytochrome signaling among the biological processes of wheat JmjC proteins. This prediction may reflect frequency bias from ProtNLM2 associating photomorphogenesis-related terms as a cluster for JmjC-like sequences in the training data. |
|
| A0A3B6RKV1 | Triticum aestivum |
GO:0010030 positive regulation of seed germination |
GO_BP | UNC | 1 | Arabidopsis JMJ22/PKDM7D participates in seed germination through histone arginine demethylation at GA biosynthesis gene loci, providing ortholog-based support for this prediction. GA signaling is a key regulator of seed germination across angiosperms, and the mechanistic link between histone demethylation and GA-dependent germination is well-established in Arabidopsis. However, the specific polarity of regulation (positive vs. negative) has not been established for this wheat protein. KDM5/JARID1 demethylases remove activating H3K4 marks, which typically represses transcription, so the prediction of positive regulation of germination requires that the demethylase targets negative regulators of germination (an indirect mechanism). Without wheat-specific experimental data, both the involvement in germination and the direction of regulation remain unverified. |
|
| F6LAX4 | Triticum aestivum |
GO:0046982 protein heterodimerization activity |
GO_MF | NPI | 0 | FREQUENCY_BIAS | GO:0046982 (protein heterodimerization activity) is a generic protein-protein interaction term that technically applies to any protein forming a heterodimer. While the PP2A A subunit does form an A-C heterodimer with the catalytic C subunit, this term is uninformative and misleading: the A subunit's molecular function is not generic heterodimerization but rather a highly specific scaffolding/regulatory role captured by GO:0019888 (protein phosphatase regulator activity), which is already annotated with IBA evidence. The A subunit's HEAT-repeat solenoid architecture provides a platform for both the catalytic C subunit and a regulatory B subunit, forming a heterotrimer rather than a simple heterodimer. Annotating this protein with GO:0046982 would obscure its specific phosphatase-regulatory scaffold function and replace it with a term that applies to thousands of unrelated proteins. This prediction likely reflects frequency bias, as protein heterodimerization activity is among the most commonly assigned MF terms in GO training data and is frequently predicted for any protein with protein-protein interaction domains such as HEAT repeats. |
| F6LAX4 | Triticum aestivum |
GO:0043025 neuronal cell body |
GO_CC | NPI | 0 | PATHWAY_CONTEXT_IGNORED | GO:0043025 (neuronal cell body) is an animal-specific cellular component term that refers to the soma of a neuron. Triticum aestivum is a monocotyledonous plant (Poaceae) that entirely lacks neurons, a nervous system, and any neuronal cell types. This prediction is biologically impossible for a plant protein. The error likely arises because mammalian PP2A orthologs (PPP2R1A/PPP2R1B) are abundantly expressed in neurons and are annotated to neuronal compartments in human and mouse GOA; ProtNLM2 appears to have transferred these animal-specific localization annotations across kingdoms without regard to the organism's biology. The PP2A A subunit in wheat functions in cytoplasm and nucleus (as annotated via IBA and ARBA), consistent with its role in plant hormone signaling (auxin transport, brassinosteroid signaling) rather than any neural function. |
| F6LAX4 | Triticum aestivum |
GO:0007059 chromosome segregation |
GO_BP | NPI | 0 | FREQUENCY_BIAS | GO:0007059 (chromosome segregation) is predicted based on the known role of animal PP2A holoenzymes at kinetochores and centromeres during mitosis, where PP2A-B56 complexes dephosphorylate cohesin protectors (e.g., shugoshin-bound substrates) to regulate sister chromatid cohesion and chromosome segregation. However, this role is mediated by specific B56 regulatory subunits that recruit PP2A to centromeric substrates -- the A subunit itself is the generic scaffold present in all PP2A holoenzymes regardless of substrate. More critically, while PP2A catalytic activity is conserved in plants, the specific centromeric/kinetochore PP2A-B56 chromosome segregation pathway characterized in animal cells has not been demonstrated for plant PP2A A subunits. The existing reviewed annotations for F6LAX4 deliberately exclude chromosome segregation terms: the AI review notes that GO_Central correctly did NOT propagate the lineage-specific GO:0051225 (spindle assembly) and GO:0051754 (meiotic sister chromatid cohesion) annotations from human PPP2R1A to this plant ortholog. This prediction represents frequency-biased cross-kingdom transfer of an animal-specific PP2A role. |
| F6LAX4 | Triticum aestivum |
GO:0043005 neuron projection |
GO_CC | NPI | 0 | PATHWAY_CONTEXT_IGNORED | GO:0043005 (neuron projection) refers to neurites (axons, dendrites) of neuronal cells. Like GO:0043025 (neuronal cell body), this is an animal-specific cellular component that is biologically impossible for a plant protein. Triticum aestivum has no neurons or neuron projections. The prediction stems from the same cross-kingdom mis-transfer as GO:0043025: mammalian PP2A is localized to neuronal projections where it regulates synaptic signaling and cytoskeletal dynamics, but these annotations are entirely irrelevant to a wheat scaffolding subunit. The wheat PP2A A subunit localizes to cytoplasm and nucleus, where it assembles PP2A holoenzymes for plant-specific signaling pathways (auxin, brassinosteroid, stress responses). |
| F6LAX4 | Triticum aestivum |
GO:0000775 chromosome, centromeric region |
GO_CC | NPI | 0 | FREQUENCY_BIAS | GO:0000775 (chromosome, centromeric region) is predicted based on the well-characterized role of animal PP2A-B56 holoenzymes at centromeres, where they dephosphorylate cohesin protectors (shugoshin/Sgo1) to maintain sister chromatid cohesion until anaphase onset. In animal cells, the PP2A A subunit is recruited to centromeres as part of these specific holoenzymes. However, centromeric localization of PP2A in plants is not established. The A subunit is a generic scaffold present in all PP2A complexes, and its localization is determined by the B regulatory subunit it assembles with. No evidence places the wheat A subunit specifically at centromeres; its characterized plant roles center on cytoplasmic/nuclear signaling (hormone transport, stress responses). This prediction represents frequency-biased transfer from animal PP2A annotations where centromeric localization is well-documented but lineage-specific. |
| F6LAX4 | Triticum aestivum |
GO:1990405 protein antigen binding |
GO_MF | NPI | 0 | PATHWAY_CONTEXT_IGNORED | GO:1990405 (protein antigen binding) is an immune-system-specific molecular function term describing the binding of protein antigens by antibodies, T-cell receptors, or antigen-presenting molecules (MHC). Plants lack an adaptive immune system, do not produce antibodies or T-cell receptors, and have no MHC-based antigen presentation machinery. The PP2A A subunit is a HEAT-repeat scaffolding protein with no structural or functional relationship to antigen-binding proteins. This prediction is biologically nonsensical for any plant protein, let alone a PP2A scaffold. The HEAT-repeat solenoid domain found in the PP2A A subunit mediates specific protein-protein interactions within the PP2A holoenzyme (binding the C and B subunits), not antigen recognition. This represents a severe cross-kingdom pathway context error, likely arising from spurious sequence or term co-occurrence patterns in ProtNLM2 training data. |
|
Q8P365
btuE |
Xanthomonas campestris pv. campestris |
GO:0006979 response to oxidative stress |
GO_BP | CNN | 2 | TRAINING_DATA_CONTAMINATION | GO:0006979 (response to oxidative stress) is already present as an existing IEA annotation (via InterPro2GO, GO_REF:0000002) for this protein, making this prediction correct but not novel. Furthermore, the protein also carries a phylogenetically inferred IBA annotation to the more specific child term GO:0034599 (cellular response to oxidative stress), which subsumes GO:0006979 and provides greater biological precision. BtuE (XCC4213) is classified in the glutathione peroxidase family by multiple independent lines of evidence: the GSHPx Pfam domain (PF00255), the glutathione peroxidase InterPro family (IPR000889), the GPX active-site motif (IPR029759) with a conserved catalytic residue at position 37, and the thioredoxin-like structural fold (IPR036249). Bacterial glutathione peroxidases reduce hydrogen peroxide and organic hydroperoxides using thiol-based reductants, directly protecting the cell from reactive oxygen species -- the biochemical basis of the oxidative stress response. The curated review accepted GO:0034599 as representing a core function and marked GO:0006979 as KEEP_AS_NON_CORE precisely because the broader term is redundant with the more specific IBA annotation. ProtNLM2 recapitulated the less informative of the two existing process annotations without adding any new biological insight. The prediction also missed the molecular function (GO:0004602, glutathione peroxidase activity) and the process term GO:0098869 (cellular oxidant detoxification) that more precisely capture the enzymatic mechanism by which BtuE contributes to oxidative stress defense. |
| D3VIU4 | Xenorhabdus nematophila |
GO:0015276 ligand-gated monoatomic ion channel activity |
GO_MF | NPI | 0 | FREQUENCY_BIAS | GO:0015276 (ligand-gated monoatomic ion channel activity) is unequivocally wrong for D3VIU4. This protein is FliY, the periplasmic substrate-binding component of the FliY-YecSC (TcyJLN) ABC transporter for L-cystine import. It belongs to bacterial solute-binding protein family 3 (PF00497, IPR001638, CDD cd13711 PBP2_Ngo0372_TcyA), a well-characterized family of soluble periplasmic proteins that bind amino acids or compatible solutes and deliver them to cognate ABC transporter permeases. The protein has a cleavable signal peptide (residues 1-28) for periplasmic secretion and no transmembrane helices whatsoever. Ligand-gated ion channels, by contrast, are integral membrane proteins with multiple transmembrane segments that form a selective ion- conducting pore opened by ligand binding -- a completely different protein architecture and mechanism. No domain hit (InterPro, Pfam, CDD, PANTHER, PROSITE), no eggNOG orthologous group (COG0834, which covers extracellular solute-binding proteins), and no literature on FliY orthologs in E. coli or other Enterobacterales supports any ion channel activity. The known molecular function of FliY is amino acid binding (GO:0016597) in the context of L-cystine transport (GO:0015811). This prediction appears to be a frequency- biased misassignment, possibly driven by superficial sequence features that the model incorrectly associates with channel activity rather than substrate binding. |
| A0A8J0SCI2 | Xenopus tropicalis |
GO:0000978 RNA polymerase II cis-regulatory region sequence-specific DNA binding |
GO_MF | CNN | 2 | This prediction correctly identifies sequence-specific DNA binding at RNA polymerase II cis-regulatory regions, which is well-supported by the presence of 8 tandem C2H2 zinc finger domains spanning residues 37-260. Each zinc finger module makes base-specific contacts in the DNA major groove, enabling sequence-specific recognition of regulatory DNA elements. However, this is not a novel prediction: GO:0000978 is already present as an IBA annotation from PANTHER phylogenetic inference (GO_Central, GO_REF:0000033), and the protein is classified within the Krueppel C2H2-type zinc finger family where this function is expected. The prediction matches an exact existing GOA annotation (GO:0003677 DNA binding is the stated comparator but the actual existing annotation is the more specific GO:0000978 itself). |
|
| A0A8J0SCI2 | Xenopus tropicalis |
GO:0006357 regulation of transcription by RNA polymerase II |
GO_BP | CNN | 2 | Regulation of transcription by RNA polymerase II is the expected core biological process for a Krueppel-type C2H2 zinc finger transcription factor. The protein has the canonical domain architecture (tandem C2H2 zinc fingers) and UniProt notes it "may be involved in transcriptional regulation." This prediction is correct but not novel: GO:0006357 is already present as an IBA annotation from PANTHER phylogenetic inference (GO_Central, GO_REF:0000033). The existing annotation GO:0000981 (DNA-binding TF activity, RNA pol II-specific) captures the molecular function aspect; GO:0006357 captures the biological process. Both are already in GOA. |
|
| A0A8J0SCI2 | Xenopus tropicalis |
GO:0005634 nucleus |
GO_CC | CNN | 2 | Nuclear localization is strongly supported for a C2H2 zinc finger transcription factor whose function requires binding genomic DNA in the nucleus. The protein has 8 tandem C2H2 zinc finger domains that mediate sequence-specific DNA binding, necessitating nuclear localization. UniProt ARBA annotation independently predicts nucleus (ARBA00004123). This prediction is correct but not novel: GO:0005634 is already annotated twice in GOA -- once via IBA (PANTHER, GO_REF:0000033 with qualifier is_active_in) and once via IEA (UniProt subcellular location mapping, GO_REF:0000044 with qualifier located_in). |
|
| A0A8J0SCI2 | Xenopus tropicalis |
GO:0001228 DNA-binding transcription activator activity, RNA polymerase II-specific |
GO_MF | NPI | 0 | PARALOG_OVERANNOTATION | This prediction incorrectly specifies transcription activator activity for a protein where no evidence supports activator versus repressor function. A0A8J0SCI2 is a pure zinc finger array protein with 8 tandem C2H2 zinc finger domains spanning nearly its entire length (residues 37-260) and a short N-terminal disordered region (residues 1-27). Critically, it lacks any known effector domains -- no KRAB repressor domain, no SCAN dimerization domain, no BTB/POZ domain, and no identifiable activation domain. Without an effector domain, the directionality of transcriptional regulation (activation vs. repression) cannot be inferred from sequence alone. The existing IBA annotation appropriately uses the parent term GO:0000981 (DNA-binding transcription factor activity, RNA polymerase II-specific), which is agnostic to activator/repressor function. As noted in the gene review, ProtNLM2 likely derived this prediction from sequence similarity to KRAB-ZNF proteins (CATH FunFam assignments include ZNF1184 and ZNF527/577, which are KRAB-containing repressors), but this protein lacks the KRAB domain entirely, making the activator/repressor specificity prediction unreliable. The error is classified as PARALOG_OVERANNOTATION because the prediction inappropriately transfers a specific functional characteristic (activator activity) from distantly related zinc finger proteins with different domain architectures. |
| A0A8J1IYX6 | Xenopus tropicalis | No predictions | |||||
| F6WPT1 | Xenopus tropicalis | No predictions | |||||