ARGO-ProtNLM-50 Prediction Evaluation

50
Proteins
77
Predictions
17
COR
11
CNN
18
LSP
15
UNC
2
PLI
14
NPI
1.4
Mean CS
COR (17)
CNN (11)
LSP (18)
UNC (15)
NPI (14)
COR — Correct novel
CNN — Correct, not novel
LSP — Less precise
UNC — Uncertain
PLI — Paralog incorrect
NPI — Nonparalog incorrect
REP — Frequency bias
Protein Organism Predicted Term Type Assessment CS Error Summary
A0A8C9H4D2
OLFML2A
Piliocolobus tephrosceles GO:0031012
extracellular matrix
GO_CC COR 2
ProtNLM2 predicted GO:0031012 (extracellular matrix), a cellular component term that is not present in the existing GOA annotations for this protein. The existing CC annotation is GO:0005615 (extracellular space, IEA:TreeGrafter), which is a broader localization. This prediction is assessed as correct and novel based on multiple lines of evidence. OLFML2A is a secreted glycoprotein (UniProt keyword "Secreted", ARBA evidence) containing a C-terminal olfactomedin-like domain (IPR003112, Pfam PF02191). Olfactomedin-family proteins are characteristically matricellular glycoproteins that function within the extracellular matrix rather than simply being soluble in the extracellular space. The AI review of this protein explicitly identifies it as a "matricellular regulatory protein involved in cell-extracellular matrix communication, cell adhesion, and modulation of cell migration," and the core function annotation assigns GO:0005201 (extracellular matrix structural constituent) as the molecular function. ECM localization is the natural compartment for a protein with this domain architecture and functional profile. GO:0031012 (extracellular matrix) is a child of GO:0005576 (extracellular region, the existing GOA annotation from GO_REF:0000044) and provides a more precise and informative localization, making this a genuinely novel and useful prediction not captured by existing annotations.
A0A8C9H4D2
OLFML2A
Piliocolobus tephrosceles GO:0030198
extracellular matrix organization
GO_BP COR 2
ProtNLM2 predicted GO:0030198 (extracellular matrix organization), a biological process term absent from the existing GOA annotations for this protein. The only existing BP annotation is GO:0007165 (signal transduction, IEA:TreeGrafter), which the AI review flagged as an over-annotation because OLFML2A's influence on signaling is indirect, mediated through ECM interactions rather than conventional signal transduction. This prediction is assessed as correct and novel. OLFML2A is a secreted olfactomedin-family glycoprotein whose core function is described as a "matricellular regulator" that "contributes to extracellular matrix organization, cell adhesion, and modulation of cell migration." The protein contains the olfactomedin-like domain (IPR003112), which adopts a five-bladed beta-propeller fold mediating protein-protein interactions in the extracellular space -- a structural basis consistent with a role in organizing ECM components. ECM organization is a well-established biological process for secreted matricellular proteins that modulate the structural and signaling properties of the matrix without being classical structural components like collagens. This prediction captures a biologically informative process that is more specific and accurate than the existing signal transduction annotation, and fills a gap in the current functional annotation of OLFML2A.
A0A8B8L1Z3 Abrus precatorius GO:0005783
endoplasmic reticulum
GO_CC UNC 1 FREQUENCY_BIAS
GO:0005783 (endoplasmic reticulum) is predicted for a J domain-containing protein that has no existing GO annotations. While a subset of J-domain (DnaJ/Hsp40) proteins are ER-resident co-chaperones -- notably ERdj1 through ERdj8 in mammals and their plant homologs, which partner with BiP (ER-luminal Hsp70) in nascent chain translocation, protein folding, and ER-associated degradation -- ER-targeted J-domain proteins characteristically possess an N-terminal signal peptide or transmembrane anchor that directs them to the ER membrane or lumen. This protein lacks both: UniProt reports no signal peptide, no transmembrane domain, and the C-terminal sequence (ending ...VGDDKVKGH) contains no KDEL/HDEL ER-retention signal. The domain architecture consists of a single J domain at positions 70-135 followed by large intrinsically disordered regions (residues 138-196, 312-408) with proline-rich, basic/acidic, and polar compositional biases. This disordered-rich architecture is more typical of cytoplasmic or nuclear J-domain proteins involved in transcriptional regulation or chromatin remodeling than of ER-luminal chaperones. Additionally, the InterPro classification IPR053052 (Imprinting Balance Regulator) suggests homology to nuclear/cytoplasmic regulatory proteins rather than ER-resident chaperones. In plants, cytoplasmic and chloroplastic J-domain proteins substantially outnumber ER-targeted ones. Without a signal peptide, transmembrane anchor, or ER-retention motif, there is no sequence-based evidence supporting ER localization. The prediction likely reflects frequency bias in ProtNLM2 training data, where endoplasmic reticulum is over-represented among chaperone-domain-containing proteins due to the well-studied ER protein quality control machinery.
A0A6I8TLE4 Aedes aegypti GO:0005096
GTPase activator activity
GO_MF CNN 2 TRAINING_DATA_CONTAMINATION
GO:0005096 (GTPase activator activity) is an exact match to the existing GOA annotation for this protein (IEA:UniProtKB-KW). The prediction is biologically correct: A0A6I8TLE4 contains a canonical RasGAP domain (IPR001936, Pfam PF00616) with the conserved catalytic arginine finger motif detected by PROSITE PS00509, which is the hallmark of proteins that stimulate the intrinsic GTPase activity of Ras-family small GTPases. The CDD match to cd05136 (RasGAP_DAB2IP) further specifies this as a DAB2IP/SynGAP-subfamily RasGAP, and PANTHER classifies it as PTHR10194:SF60 (raskol), indicating orthology to a well-characterized Drosophila RasGAP. However, because this annotation was already present in GOA via automated keyword mapping (IEA:UniProtKB-KW from the "GTPase activation" keyword), the prediction is classified as CNN (Correct but Not Novel) rather than COR. The sequence features driving the ProtNLM2 prediction are the same ones that generated the existing IEA annotation, indicating likely training data contamination. Notably, ProtNLM2 did not predict any additional GO terms that the multi-domain architecture strongly supports: the C2 domain (cd04013, SynGAP-like) and PH domain (cd13262, SynGAP-like) predict calcium-dependent and phosphoinositide-dependent membrane targeting respectively, and the DAB2P_C domain (IPR021887) suggests scaffold/adaptor functions. These domains collectively point to involvement in Ras protein signal transduction (GO:0007265) and negative regulation thereof (GO:0046580), as well as plasma membrane and cytosol localization -- none of which were captured by the model.
A0A2G9RZF1 Aquarana catesbeiana GO:0007338
single fertilization
GO_BP UNC 1
ProtNLM2 predicted GO:0007338 (single fertilization) for this CUB domain-containing protein. The prediction is biologically plausible: PANTHER classifies this protein in the OVOCHYMASE-RELATED family (PTHR24251), and ovochymases are serine proteases involved in egg envelope hardening during fertilization in amphibians. CUB domains are also found in amphibian egg envelope glycoproteins that mediate sperm-egg recognition (Hedrick 2008). However, A0A2G9RZF1 is only 156 aa (likely a protein fragment), contains a single CUB domain without any identifiable protease or catalytic domain, and UniProt notes it lacks conserved residues required for feature propagation. There are no GOA annotations, no expression data, and no direct experimental evidence linking this specific protein to fertilization. While the family context makes reproductive biology a reasonable hypothesis, the prediction cannot be validated or refuted with available evidence.
A0A2G9RZF1 Aquarana catesbeiana GO:0005576
extracellular region
GO_CC COR 2
ProtNLM2 predicted GO:0005576 (extracellular region) for this CUB domain-containing protein. This prediction is well-supported by convergent domain and family evidence. CUB domains are almost exclusively found in secreted or cell-surface proteins that function in the extracellular space (Thomas et al. 2024, Lin et al. 2023, Gonzalez-Calvo et al. 2022). The PANTHER family assignment (PTHR24251, OVOCHYMASE-RELATED) groups this with secreted extracellular proteases. The FunFam classification maps it to Procollagen C-endopeptidase enhancer 1 (PCPE1), which is a well-characterized secreted extracellular glycoprotein. UniProt keywords include Disulfide bond and Zymogen, both consistent with a secreted extracellular protein. No GOA annotations exist for this protein, so this represents a genuinely novel correct prediction rather than a rediscovery of existing annotation.
A0A444Z7V7 Arachis hypogaea No predictions
F4JLB7 Arabidopsis thaliana GO:0016310
phosphorylation
GO_BP PLI 0 PARALOG_OVERANNOTATION
GO:0016310 (phosphorylation) predicts that RIC7 is involved in a phosphorylation process, implying it acts as or with a kinase. This is incorrect. RIC7 is a ROP GTPase effector protein whose characterized biological roles are negative regulation of stomatal opening (GO:1902457) and regulation of stomatal movement (GO:0010119), mediated by binding activated ROP2 via its CRIB domain and inhibiting Exo70B1. RIC7 belongs to the receptor-like protein (RLP) family, which by definition lacks the intracellular kinase domain present in the related receptor-like kinase (RLK) family. The UniProt entry for F4JLB7 lists no kinase-related domains, keywords, or functions. InterPro annotations show only LRR domains (IPR001611, IPR032675) and a stomatal development/plant interaction regulator domain (IPR052941), with no protein kinase domain. The FunFam classification with the ERECTA kinase superfamily (3.80.10.10:FF:000041) reflects shared LRR structural domains, not shared kinase function -- ERECTA is an RLK with a kinase domain, while RIC7 is an RLP without one. No experimental evidence from Wu et al. 2001 (PMID:11752391) or subsequent functional studies supports any role for RIC7 in phosphorylation. This prediction is a paralog overannotation error arising from failure to distinguish kinase-containing from kinase-lacking members of the LRR superfamily.
F4JLB7 Arabidopsis thaliana GO:0016301
kinase activity
GO_MF PLI 0 PARALOG_OVERANNOTATION
GO:0016301 (kinase activity) predicts that RIC7 catalyzes the transfer of a phosphate group to a substrate. This is incorrect. RIC7's characterized molecular function is small GTPase binding (GO:0031267), not kinase activity. The protein functions as a signaling adaptor downstream of ROP2, not as a catalytic enzyme. Structurally, RIC7 contains a CRIB (Cdc42/Rac-interactive binding) motif for GTPase interaction and LRR domains for protein-protein interactions, but no kinase domain of any type. The Pfam annotations for F4JLB7 are exclusively LRR_1 (PF00560, 3 copies) and LRR_8 (PF13855, 1 copy), with no Pkinase (PF00069) or Pkinase_Tyr (PF07714) domains. This distinguishes RIC7 from LRR-RLK proteins like ERECTA (AT2G26330), which share the LRR extracellular domain but additionally possess a cytoplasmic kinase domain. The curated ai-review explicitly notes that RIC7 "lacks a kinase domain and functions as a cytoplasmic effector rather than a receptor." ProtNLM2 likely propagated kinase activity from LRR-RLK superfamily members in its training data to this kinase-lacking RLP family member, a classic Type 6 paralog overannotation error where the model fails to distinguish nonisofunctional members of a protein superfamily.
A0A2U1PS28 Artemisia annua GO:0009507
chloroplast
GO_CC UNC 1
ProtNLM2 predicted chloroplast localization for this plant GUF1 homolog. All existing GOA annotations place this protein in the mitochondrion (GO:0005743 mitochondrial inner membrane, GO:0005759 mitochondrial matrix), based on HAMAP-Rule MF_03137 and UniProtKB-UniRule. However, the PANTHER subfamily classification (PTHR43512:SF4) explicitly labels this protein as "TRANSLATION FACTOR GUF1 HOMOLOG, CHLOROPLASTIC," providing independent support for chloroplast targeting. In plants, the LepA/EF-4 family includes paralogs targeted to different organelles: chloroplastic cpLepA functions in plastid translation while mitochondrial GUF1 operates in mitochondrial translation. Both organelles maintain their own translation machinery and require elongation factor 4 for translational quality control. The N-terminal sequence of A0A2U1PS28 contains features (disordered polar-rich region, residues 1-25) that could constitute either a mitochondrial or chloroplast transit peptide, and distinguishing these computationally is notoriously difficult in plants. The existing AI review explicitly flags the question of whether this protein localizes to mitochondria, chloroplasts, or both as an unresolved issue requiring fluorescent tagging and confocal microscopy with organelle markers. Without such experimental data, this prediction remains genuinely uncertain -- it is biologically plausible given the PANTHER classification and the known diversity of organellar EF-4 targeting in plants, but it contradicts the HAMAP-based annotations that form the basis of the current GOA record.
Q2U1U6 Aspergillus oryzae GO:0000272
polysaccharide catabolic process
GO_BP COR 2
GO:0000272 (polysaccharide catabolic process) is a correct novel prediction for Q2U1U6. The protein contains a Chondroitin_lyas domain (IPR008929) and matches the Chondroitin AC/alginate lyase SUPFAM fold (SSF48230), both of which are diagnostic for polysaccharide lyases that depolymerize glycosaminoglycan chains. The deep research report confirms that the protein is predicted to function as a polysaccharide lyase degrading GAG substrates (chondroitin sulfate, dermatan sulfate, and possibly hyaluronic acid) through a beta-elimination mechanism. This activity falls squarely within polysaccharide catabolic process. The term is somewhat broad -- a more precise annotation might be glycosaminoglycan catabolic process (GO:0006027) -- but GO:0000272 is not wrong and represents a biologically meaningful novel prediction for a protein with no existing GOA annotations. Aspergillus species encode extensive CAZyme repertoires including polysaccharide lyases, and the comparative genomics of section Flavi Aspergilli supports the plausibility of such activity in A. oryzae.
Q2U1U6 Aspergillus oryzae GO:0004553
hydrolase activity, hydrolyzing O-glycosyl compounds
GO_MF NPI 0 FREQUENCY_BIAS
GO:0004553 (hydrolase activity, hydrolyzing O-glycosyl compounds) is mechanistically incorrect for Q2U1U6. The protein's only domain annotation is Chondroitin_lyas (IPR008929) with a structural match to Chondroitin AC/alginate lyase (SSF48230), placing it firmly in the polysaccharide lyase (PL) superfamily, not the glycoside hydrolase (GH) superfamily. These are fundamentally different enzyme classes in the CAZy classification system. Polysaccharide lyases cleave glycosidic bonds via a beta-elimination mechanism, abstracting a proton from C5 of a hexuronic acid residue and eliminating across C4-O4 to generate products with a characteristic delta-4,5-unsaturated bond. Glycoside hydrolases, by contrast, cleave glycosidic bonds through hydrolysis, adding water across the bond. The reaction mechanisms, active site architectures, and products are categorically different. ProtNLM2 appears to have conflated the general concept of polysaccharide-degrading activity with glycoside hydrolase activity, likely because glycoside hydrolases are the most frequently annotated polysaccharide-degrading enzymes in training data. The correct molecular function term would be in the polysaccharide lyase activity branch, not the hydrolase branch.
A0A8B8WEG2 Balaenoptera musculus No predictions
Q7VZI5 Bordetella pertussis No predictions
E1BL04 Bos taurus GO:0030036
actin cytoskeleton organization
GO_BP LSP 2
ProtNLM2 predicted GO:0030036 (actin cytoskeleton organization), which is the direct parent of the existing IBA annotation GO:0007015 (actin filament organization). XIRP2 organizes actin filaments specifically within the sarcomere via its 26 Xin repeats that bind F-actin, and mouse knockout studies demonstrate disrupted actin filament architecture in cardiomyocytes. The existing annotation at GO:0007015 is more informative because XIRP2 acts directly on actin filaments rather than the broader actin cytoskeleton (which includes non-filament structures such as the Arp2/3 branched network). The ai-review itself marks GO:0030036 as MARK_AS_OVER_ANNOTATED relative to GO:0007015. The prediction is biologically correct but adds no precision beyond what IBA phylogenetic inference already captures.
E1BL04 Bos taurus GO:0030054
cell junction
GO_CC LSP 2
ProtNLM2 predicted GO:0030054 (cell junction), which is a broad ancestor of multiple more specific GOA annotations: GO:0005925 (focal adhesion, IBA/IEA), GO:0005911 (cell-cell junction, IEA), and GO:0070161 (anchoring junction, IEA). XIRP2 localizes to intercalated discs in cardiomyocytes -- specialized cell-cell junctions containing adherens junctions -- and colocalizes with focal adhesions (or their muscle-equivalent costameres) in non-muscle overexpression assays. The UniProt subcellular location annotation already lists "Cell junction" (ARBA), and InterPro2GO maps the Xin repeat domain to this same broad term. The prediction is correct but at the least informative level of the GO hierarchy; the more specific terms already in GOA (focal adhesion, cell-cell junction, anchoring junction) better capture XIRP2 biology. The ai-review marks GO:0030054 as MARK_AS_OVER_ANNOTATED.
E1BL04 Bos taurus GO:0003779
actin binding
GO_MF LSP 2
ProtNLM2 predicted GO:0003779 (actin binding), the direct parent of the existing IBA and IEA annotation GO:0051015 (actin filament binding). XIRP2 contains 26 Xin repeats (PROSITE) / 18 Pfam Xin domains that specifically bind F-actin (filamentous actin), not monomeric G-actin. The distinction matters: GO:0003779 encompasses both G-actin and F-actin binding, while GO:0051015 correctly restricts to the filamentous form that the Xin repeat domain engages. The ai-review explicitly marks GO:0003779 as MARK_AS_OVER_ANNOTATED and identifies GO:0051015 as the core molecular function. Combined IEA methods (GO_REF:0000120) already assign GO:0003779 via InterPro, so this prediction recapitulates an existing automated annotation at a less informative level than the best available term.
A0A061AL94
mcm-4
Caenorhabditis elegans GO:0006367
transcription initiation at RNA polymerase II promoter
GO_BP NPI 0 FREQUENCY_BIAS
MCM-4 is a subunit of the MCM2-7 replicative DNA helicase complex and has no known role in transcription initiation at RNA polymerase II promoters. The protein's sole domain in this 74 AA fragment is the winged-helix domain WHD_MCM4 (PF21128), which is structurally related to winged-helix domains found in some transcription factors. This structural similarity likely caused ProtNLM2 to predict a transcription-related function. However, the MCM4 WHD is specifically involved in DNA binding during replication origin licensing and replication fork progression, not in transcription. All established functions of MCM-4 -- DNA helicase activity, DNA replication initiation, and participation in the MCM complex -- are in the DNA replication pathway, not the transcription pathway. No literature or ortholog evidence supports a direct role for any MCM4 subunit in RNA polymerase II transcription initiation.
A0A061AL94
mcm-4
Caenorhabditis elegans GO:0005634
nucleus
GO_CC COR 2
Nuclear localization is well-established for MCM complex subunits across eukaryotes, and this prediction is biologically correct. The MCM2-7 complex, of which MCM-4 is a constitutive subunit, is loaded onto chromatin at replication origins in the nucleus during late mitosis and G1 phase and operates at replication forks during S phase. In C. elegans, MCM-4::mCherry fusion proteins have been observed associated with chromosomes in live-cell imaging, directly confirming nuclear localization. The AI review of this protein independently proposed nucleus (GO:0005634) as a NEW annotation with ISS evidence. Since this accession has no existing GOA annotations, this ProtNLM2 prediction represents a correct novel prediction that is consistent with both the known biology of MCM helicase subunits and direct experimental observations in C. elegans.
A0A4W3GVU1 Callorhinchus milii GO:0005634
nucleus
GO_CC UNC 1
ProtNLM2 predicted subcellular location 'Nucleus', mapped to GO:0005634 (nucleus). This is a subcellular location prediction rather than a direct GO term prediction.
A0A8I3PI07
CNNM4
Canis lupus familiaris GO:0005886
plasma membrane
GO_CC LSP 2
GO:0005886 (plasma membrane) is already present as an existing GOA annotation for CNNM4 (both IBA via GO_Central and IEA via Ensembl/PANTHER). The prediction is correct -- CNNM4 is an integral plasma membrane protein with a signal peptide and four transmembrane helices in the CNNM domain (residues 178-358). However, this term is less precise than the most informative cellular component annotation already in GOA: basolateral plasma membrane (GO:0016323), which was transferred from the mouse ortholog Q69ZF7 via Ensembl Compara. The basolateral localization is biologically critical because CNNM4 mediates vectorial Mg2+ efflux from the cytoplasm into the interstitial space at the basolateral surface of intestinal epithelial cells. This polarized localization is essential for transcellular magnesium absorption -- apical entry via TRPM6/TRPM7 channels followed by basolateral exit via CNNM4. The generic plasma membrane term fails to capture this functionally important membrane domain specificity. The prediction also qualifies as CNN (correct but not novel) since GO:0005886 is already annotated in GOA, but LSP is the more informative assessment given the availability of the more specific GO:0016323 annotation.
A0A8I3PI07
CNNM4
Canis lupus familiaris GO:0022857
transmembrane transporter activity
GO_MF LSP 2 FREQUENCY_BIAS
GO:0022857 (transmembrane transporter activity) is already present as an existing GOA annotation for CNNM4 (IEA via PANTHER/UniRule combined annotation, GO_REF:0000120), and was marked KEEP_AS_NON_CORE in the curated review precisely because more specific molecular function annotations exist. The protein's core molecular function is magnesium ion transmembrane transporter activity (GO:0015095), established by IBA and IEA annotations and confirmed by the curated review as the primary function. CNNM4 mediates Mg2+ efflux regulated by intracellular Mg2+-ATP binding to its CBS domain pair. GO:0022857 is an ancestor of GO:0015095 in the GO hierarchy, so while technically correct, it adds no functional information beyond what a sequence-based prediction of "this is some kind of transporter" would provide. The prediction fails to distinguish CNNM4 from any of the hundreds of other transmembrane transporters encoded in mammalian genomes, and does not capture the magnesium specificity, efflux directionality, or ATP-sensing regulatory mechanism that define this protein.
A0A8I3PI07
CNNM4
Canis lupus familiaris GO:0006811
monoatomic ion transport
GO_BP LSP 2 FREQUENCY_BIAS
GO:0006811 (monoatomic ion transport) is a broad biological process term that encompasses many specific ion transport processes. CNNM4 has multiple more specific BP annotations already in GOA: magnesium ion transport (GO:0015693, IBA and IEA), magnesium ion transmembrane transport (GO:1903830, IEA via logical inference), and magnesium ion homeostasis (GO:0010960, IBA and IEA). The task description notes this as an EXACT match to GO:0035725 (sodium ion transmembrane transport), but the curated review marks the sodium transport annotations as UNDECIDED because the Na+ transport activity of CNNM4 is less well established than its primary Mg2+ efflux function -- the sodium transport may reflect a coupled or secondary mechanism rather than an independent transport activity. Regardless, GO:0006811 is a parent term of all these more specific processes and fails to capture the defining biology of CNNM4: its role as a magnesium efflux transporter critical for systemic Mg2+ homeostasis, whose loss of function causes Jalili syndrome (cone-rod dystrophy with amelogenesis imperfecta) in humans. The prediction is uninformatively generic.
Q7NUH2 Chromobacterium violaceum No predictions
A0A2I0M3K7 Columba livia GO:0106029
tRNA pseudouridine synthase activity
GO_MF CNN 2
ProtNLM2 predicted GO:0106029 (tRNA pseudouridine synthase activity), which is a child term of the existing GOA annotation GO:0009982 (pseudouridine synthase activity). The prediction is biologically correct: TRUB2 belongs to the TruB family of pseudouridine synthases and specifically catalyzes uridine-to-pseudouridine isomerization in tRNA substrates. In mammals, TRUB2 acts as a mitochondrial tRNA Psi55 synthase, modifying the conserved U55 position in the TPC loop of select mitochondrial tRNAs, and the Columba livia ortholog contains the conserved TruB N-terminal domain (PF01509) and is classified under InterPro family IPR039048 (Trub2), supporting functional equivalence. However, this is scored CNN (correct but not novel) rather than COR because the tRNA substrate specificity is already implicit in the InterPro-derived annotation: IPR039048 specifically identifies the protein as Trub2 (a known tRNA pseudouridine synthase), and the existing GO:0009982 annotation was assigned through that same InterPro mapping. The ProtNLM2 prediction thus refines granularity but does not provide genuinely new functional insight beyond what domain-based inference already established. Notably, an even more specific term GO:0160148 (tRNA pseudouridine(55) synthase activity) would better reflect TRUB2's known positional specificity at U55 in the TPC loop.
A0A8C2TBA7
PAM
Coturnix japonica GO:0004598
peptidylamidoglycolate lyase activity
GO_MF CNN 2
GO:0004598 (peptidylamidoglycolate lyase activity) corresponds to the enzymatic activity of the C-terminal PAL domain of PAM, which cleaves the peptidyl-alpha-hydroxyglycine intermediate produced by the PHM domain to yield the mature alpha-amidated peptide and glyoxylate (EC:4.3.2.5). This prediction is biologically correct: PAM is a bifunctional enzyme and the PAL lyase activity is one of its two core catalytic functions. However, GO:0004598 is already directly annotated in GOA via EC number mapping (GO_REF:0000003, with/from EC:4.3.2.5), making this a correct but not novel prediction. The automated overlap analysis flagged this as matching the parent term GO:0003824 (catalytic activity), but the exact term GO:0004598 is itself present in GOA. The UniProt entry explicitly lists EC=4.3.2.5 and documents the PAL reaction.
A0A8C2TBA7
PAM
Coturnix japonica GO:0031418
L-ascorbic acid binding
GO_MF COR 2
GO:0031418 (L-ascorbic acid binding) is a biologically correct novel prediction for PAM. The N-terminal PHM (peptidylglycine alpha-hydroxylating monooxygenase) domain is a copper-dependent monooxygenase that requires L-ascorbate as an obligate electron donor for its catalytic mechanism. The UniProt catalytic activity record explicitly shows "2 L-ascorbate" as a substrate in the PHM reaction, and the deep research confirms that "reduced ascorbate as an electron donor" is one of three essential cofactors, with "one mole of ascorbate consumed per mole of amidated product formed." While GOA contains GO:0016715 (oxidoreductase activity, acting on paired donors, with reduced ascorbate as one donor), which implicitly references ascorbate in the reaction mechanism, no explicit ascorbate binding term is present in the curated annotations. GO:0031418 captures a genuine molecular interaction -- the PHM domain must physically bind L-ascorbate to accept electrons for the copper center reduction -- that is not redundant with existing annotations. The InterPro domain signatures (Cu2_ascorb_mOase_N, Cu2_ascorb_mOase_CS-1/2) further confirm ascorbate dependence as a defining feature of this enzyme family. This represents a meaningful addition to the functional annotation of PAM.
A0A1S3BTE3 Cucumis melo No predictions
A0A8M9QG43
dnajc6
Danio rerio GO:0016311
dephosphorylation
GO_BP NPI 0 FREQUENCY_BIAS
Auxilin/DNAJC6 contains a PTEN-like phosphatase domain (residues 109-276), which ProtNLM2 likely used to predict involvement in dephosphorylation. However, the phosphatase domain of auxilin has only "probable" phosphatase activity per UniProt characterization. Its experimentally established role is phosphoinositide binding for membrane targeting during clathrin-mediated endocytosis, not catalytic dephosphorylation of substrates. The curated ai-review independently marked the related GOA annotation for phosphoprotein phosphatase activity (GO:0004721) as MARK_AS_OVER_ANNOTATED and hydrolase activity (GO:0016787) as REMOVE, noting that these sequence- feature-based annotations overstate the functional evidence. The core molecular function of auxilin is as a J-domain co-chaperone that recruits and stimulates HSC70 ATPase activity to disassemble clathrin coats from newly formed vesicles. Loss-of-function phenotypes in human (PARK19 Parkinson's disease) and zebrafish (impaired Notch signaling) are attributable to defective clathrin uncoating, not loss of phosphatase activity. This prediction reflects the model's reliance on the PTEN-like domain fold without accounting for the divergent functional role of this domain in the auxilin protein context.
Q9RSY6 Deinococcus radiodurans GO:0003676
nucleic acid binding
GO_MF LSP 2
Less precise than the existing curated annotation. ProtNLM2 predicted GO:0003676 (nucleic acid binding), which is a high-level ancestor of GO:0003729 (mRNA binding) already present in GOA via both IBA (GO_REF:0000033, inferred from E. coli RpsA P0AG67) and IEA (GO_REF:0000117, ARBA). Ribosomal protein bS1 specifically binds mRNA 5-prime UTRs through its five tandem S1/OB-fold domains (residues 122-539) to recruit messages to the 30S subunit for translation initiation -- this is a defined and well-characterized molecular activity, not generic nucleic acid binding. The ai-review itself marked the existing GO:0003676 IEA annotation (from InterPro2GO mapping of the S1 domain IPR003029) as MARK_AS_OVER_ANNOTATED because GO:0003729 already captures the function at higher specificity. The ProtNLM2 prediction thus recapitulates an annotation already flagged as uninformatively broad, adding no biological insight beyond what the curated mRNA binding annotation provides.
Q9RSY6 Deinococcus radiodurans GO:0005840
ribosome
GO_CC LSP 2
Less precise than the existing curated annotation. ProtNLM2 predicted GO:0005840 (ribosome), which is a direct parent of GO:0022627 (cytosolic small ribosomal subunit) already annotated via IBA (GO_REF:0000033). As a canonical bacterial ribosomal protein, bS1 is specifically a component of the 30S (small) ribosomal subunit -- not the 50S large subunit or the assembled 70S ribosome in general. D. radiodurans bS1 belongs to the bacterial ribosomal protein bS1 family (COG0539) and its name explicitly identifies it as the small subunit protein. The prediction correctly places the protein in a ribosomal context but fails to resolve which ribosomal subunit it occupies, information that is both biologically important (bS1 functions exclusively in the 30S subunit during mRNA recruitment) and already captured by the existing IBA annotation to GO:0022627.
A0A6I8W8A2 Drosophila pseudoobscura pseudoobscura GO:0016874
ligase activity
GO_MF NPI 0 FREQUENCY_BIAS
GO:0016874 (ligase activity) is incorrect for this protein. The prediction appears driven by the automated RefSeq protein name "Probable E3 ubiquitin-protein ligase HERC3 isoform X3" rather than the actual domain content. HERC family E3 ligases require a C-terminal HECT (Homologous to E6-AP Carboxyl Terminus) domain to catalyze the transfer of ubiquitin from an E2 conjugating enzyme to substrate proteins. This 169 AA protein is far too short to contain a HECT domain (typically 350+ AA) and its entire domain architecture consists exclusively of two RCC1 repeats (positions 32-87 and 88-142), as confirmed by PROSITE, Pfam (PF00415 RCC1, PF13540 RCC1_2), InterPro (IPR000408), and Gene3D (2.130.10.30). The RCC1 repeat region in HERC proteins functions in substrate recognition and protein-protein interactions, not in catalytic ubiquitin ligation. This isoform X3 likely represents a truncated splice variant retaining only the N-terminal substrate-binding region of the full-length HERC3 ortholog. An appropriate molecular function annotation for this fragment would be in the protein binding domain (e.g., contributing to substrate recognition in a complex), not ligase activity. The protein has no curated GOA annotations and is unreviewed in UniProt (PE 4, predicted), further indicating that no experimental or curated evidence supports ligase activity for this specific gene product.
B4MAQ2 Drosophila virilis GO:0005737
cytoplasm
GO_CC UNC 1
ProtNLM2 predicted subcellular location 'Cytoplasm', mapped to GO:0005737 (cytoplasm). This is a subcellular location prediction rather than a direct GO term prediction.
A0A8C5FPT8
tbc1d14
Gadus morhua GO:0005776
autophagosome
GO_CC NPI 0
ProtNLM2 predicted GO:0005776 (autophagosome) as a more specific replacement for the existing GOA annotation GO:0005773 (vacuole). This prediction is incorrect. TBC1D14 is a negative regulator of macroautophagy that acts by controlling ATG9 vesicle trafficking from recycling endosomes, but it does not localize to autophagosomes themselves. Detailed studies of mammalian TBC1D14 (Lamb et al. 2016) demonstrate localization to RAB11-positive recycling endosomes, the Golgi complex, and tubulo-vesicular transport intermediates between these compartments. When TBC1D14 is overexpressed, it causes tubulation of recycling endosomes and sequesters ULK1 and ATG9 away from autophagosome formation sites, thereby inhibiting autophagy -- but TBC1D14 itself remains on the endosomal compartment, not on the autophagosome. The existing vacuole annotation (GO:0005773) was already flagged as MARK_AS_OVER_ANNOTATED in the curated review because no evidence supports vacuolar or lysosomal localization. The ProtNLM2 prediction of autophagosome appears to conflate functional involvement in autophagy regulation (a biological process) with physical residence at the autophagosome (a cellular component), a category error. The correct cellular component annotations for TBC1D14 would be recycling endosome (GO:0055037) and Golgi apparatus (GO:0005794), neither of which was predicted.
S0EDH7 Gibberella fujikuroi GO:0006468
protein phosphorylation
GO_BP COR 2
Protein phosphorylation is the canonical biological process catalyzed by protein kinases. S0EDH7 contains a kinase-like domain fold confirmed by three independent domain classification methods: InterPro IPR011009 (Kinase-like domain superfamily), Gene3D 1.10.510.10 (Transferase/Phosphotransferase domain 1), and SUPFAM SSF56112 (Protein kinase-like). There are no existing GOA annotations for this protein, making this a genuinely novel prediction. While no experimental evidence exists (PE level 4), the convergent domain evidence strongly supports protein kinase activity. In Fusarium fujikuroi, protein kinases are central to signaling pathways regulating growth, secondary metabolism, and pathogenicity, though S0EDH7 itself has not been assigned to any specific kinase subfamily or signaling module. Assessed as COR (correct novel) because the domain architecture robustly supports this function despite the absence of direct experimental validation.
S0EDH7 Gibberella fujikuroi GO:0005524
ATP binding
GO_MF COR 2
ATP binding is the essential molecular function underpinning protein kinase catalysis, as protein kinases use ATP as the phosphoryl group donor for substrate phosphorylation. The Gene3D classification of S0EDH7 as containing a Transferase(Phosphotransferase) domain (1.10.510.10) directly implies ATP-dependent phosphotransferase activity. The SUPFAM classification (SSF56112, Protein kinase-like) and InterPro (IPR011009, Kinase-like domain superfamily) further corroborate a fold architecture that accommodates ATP binding. No GOA annotations exist for this protein, so this is a novel prediction. The prediction is biologically coherent with the protein phosphorylation prediction (GO:0006468) -- together they describe the expected enzymatic mechanism of a protein kinase. Assessed as COR (correct novel) based on strong convergent structural domain evidence for kinase-like fold and phosphotransferase activity.
A0A2I4G8T1 Juglans regia No predictions
A0A2K5UJ34 Macaca fascicularis GO:0061371
determination of heart left/right asymmetry
GO_BP COR 2
ProtNLM2 predicted GO:0061371 (determination of heart left/right asymmetry), a biological process term with no overlap in the existing GOA annotations for this protein (which contain only GO:0060271 cilium assembly and GO:0032474 otolith morphogenesis, both IEA/TreeGrafter). This prediction is assessed as correct and novel based on converging evidence. First, the zebrafish ortholog of TTC39C (Q1LXE6) has direct experimental evidence (IMP) for involvement in determination of heart left/right asymmetry, establishing that this function is genuinely associated with TTC39C orthologs. Second, heart left/right asymmetry determination is a cilium-dependent process in vertebrate development: motile cilia at the embryonic node generate leftward fluid flow that breaks bilateral symmetry and initiates the Nodal signaling cascade. Third, TTC39C has been experimentally confirmed to localize to cilia in C. elegans sensory neurons (Pir et al. 2024, Ciliogenics study), and TPR domain proteins are well-established ciliary scaffolds. Unlike otolith morphogenesis (which is taxonomically inappropriate for a primate), heart left/right asymmetry determination via nodal cilia is a conserved developmental mechanism in mammals, making this prediction biologically appropriate for Macaca fascicularis. The prediction adds a specific cilium-dependent developmental outcome beyond the generic cilium assembly term already in GOA, representing genuinely informative functional annotation.
A0A804UIX9 Zea mays No predictions
A0A8B6BFL6 Mytilus galloprovincialis GO:0003677
DNA binding
GO_MF LSP 2
DNA binding is technically correct for a reverse transcriptase domain-containing protein from a DIRS1-type retrotransposon, since the protein must interact with DNA during reverse transcription (cDNA synthesis) and integration. However, GO:0003677 is a very broad molecular function term that fails to capture the actual enzymatic activity. The primary molecular function of this protein is RNA-directed DNA polymerase activity (GO:0003964), which is far more informative. Additionally, DIRS1 elements encode a tyrosine recombinase domain for integration that also binds DNA, but again, the specific catalytic function is more informative than generic DNA binding. Assessed as LSP because the prediction is correct at a high level but substantially less precise than what domain architecture alone would support.
A0A8B6BFL6 Mytilus galloprovincialis GO:0006310
DNA recombination
GO_BP COR 2
DNA recombination is a correct novel prediction for a DIRS1-type retrotransposon protein. Unlike LINE retrotransposons that use target-primed reverse transcription (TPRT) for integration, DIRS1 elements integrate into the host genome via tyrosine recombinase-mediated site-specific recombination. The CDD domain RNase_HI_RT_DIRS1 (cd09275) specifically identifies this protein as belonging to the DIRS1 class, whose integration mechanism is fundamentally recombination-based. DIRS1 elements produce circular DNA intermediates that are then integrated through recombination at their inverted terminal repeats. This term is absent from GOA for this protein and represents a biologically accurate prediction supported by the known mechanism of DIRS1 element propagation.
A0A8B6BFL6 Mytilus galloprovincialis GO:0015074
DNA integration
GO_BP COR 2
DNA integration is a correct novel prediction for a retrotransposon-encoded reverse transcriptase. Retrotransposons replicate via a copy-and-paste mechanism in which the element is transcribed to RNA, reverse-transcribed to DNA, and then integrated into new genomic loci. For DIRS1-type elements (as indicated by the RNase_HI_RT_DIRS1 CDD domain cd09275), integration proceeds through tyrosine recombinase-mediated insertion of circular DNA intermediates. The protein's domain architecture (RT domain plus DIRS1-associated RNase H) directly supports involvement in the retrotransposition cycle that culminates in genomic integration. This is a core biological process for any active retrotransposon and is not present in GOA for this uncharacterized protein from M. galloprovincialis, making it a genuinely informative novel prediction.
A0A8B6GS20 Mytilus galloprovincialis GO:0004438
phosphatidylinositol-3-phosphate phosphatase activity
GO_MF NPI 0 PARALOG_OVERANNOTATION
ProtNLM2 predicted GO:0004438 (phosphatidylinositol-3-phosphate phosphatase activity) for MTMR9, but this is a catalytic activity that MTMR9 cannot perform. MTMR9 is a well-characterized catalytically inactive pseudophosphatase within the myotubularin subfamily. It retains the overall myotubularin phosphatase domain fold (Pfam PF06602, Myotub-related; PROSITE PS51339, PPASE_MYOTUBULARIN) but lacks the critical catalytic cysteine in the conserved CX5R motif required for phosphoinositide dephosphorylation. The existing GOA annotations reflect this biology correctly: MTMR9 is annotated with protein phosphatase binding (GO:0019903), capturing its role as a scaffold that heterodimerizes with catalytically active family members (MTMR6, MTMR7, MTMR8), and its involvement in phosphatidylinositol dephosphorylation (GO:0046856) was flagged in the ai-review as requiring modification to the regulatory term GO:0060304 (regulation of phosphatidylinositol dephosphorylation). The ProtNLM2 error is a classic paralog overannotation (Type 6): the sequence-based model recognized the conserved myotubularin domain architecture and predicted the catalytic activity of the active subfamily members (MTMR1-4, MTMR6-8) without detecting the degenerate active site that distinguishes the pseudophosphatase branch (MTMR5, MTMR9, MTMR10-13). This is precisely the kind of subfamily-level functional divergence that sequence similarity methods struggle to capture, as the overall domain architecture is preserved despite loss of catalytic competence. PI(3)P phosphatase activity is the canonical substrate specificity of the active myotubularins, making it a plausible but biologically incorrect prediction for this catalytically dead family member.
B8BAB0 Oryza sativa subsp. indica GO:0010152
pollen maturation
GO_BP UNC 1
GO:0010152 (pollen maturation) is a biological process prediction for a BURP domain-containing protein with no existing GOA annotations and no direct experimental characterization (UniProt PE level 4). The prediction has some biological plausibility because the BURP protein family includes members with roles in reproductive development: BNM2 (the founding BNM2-like subfamily member) is linked to pollen grain embryogenesis, and OsRAFTIN1, a rice BURP protein, is specifically expressed in anthers during microspore development and is required for male fertility. However, B8BAB0 is classified by PANTHER (PTHR31458:SF2) as a PG1beta-like BURP protein, not a BNM2-like protein. The PG1beta-like subfamily has well-characterized members in rice (OsBURP14, OsBURP16) that function as non-catalytic beta subunits of polygalacturonase isozyme 1, participating in cell wall pectin degradation under ethylene/ABA stress signaling rather than pollen-specific processes. Furthermore, B8BAB0 contains an N-terminal signal peptide (aa 1-21) and a C-terminal BURP domain (aa 384-595) consistent with a secreted protein involved in extracellular matrix or cell wall modification, but its variable internal region and overall domain architecture align with PG-associated function (IPR051897, PG-associated_BURP) rather than reproductive-specific roles. The prediction may reflect ProtNLM2 conflating BURP family-level associations with reproductive biology (driven by BNM2-like and RAFTIN members in the training data) and applying them indiscriminately across the family. Without expression data, mutant phenotypes, or protein interaction studies for B8BAB0, the prediction cannot be confirmed or refuted. Pollen maturation does involve cell wall remodeling processes that could theoretically engage PG-associated proteins, but this indirect reasoning is insufficient for a confident assignment.
Q6YYC5 Oryza sativa subsp. japonica GO:0070534
protein K63-linked ubiquitination
GO_BP UNC 1
ProtNLM2 predicted GO:0070534 (protein K63-linked ubiquitination) as a more specific refinement of the existing GOA annotation GO:0016567 (protein ubiquitination). The prediction is biologically plausible but uncertain. The Arabidopsis ortholog RGLG2 (AT3G01650/Q9LY87), one of the source genes for the IBA transfer to Q6YYC5, has been experimentally shown to catalyze K63-linked polyubiquitin chain formation. However, ubiquitin chain-type linkage specificity is primarily determined by the E2 conjugating enzyme partner, not the E3 ligase itself. RGLG2 catalyzes K63-linked chains when paired with UBC35 (a group-III E2), but may produce different linkage types with other E2s. No experimental data exist for Q6YYC5, and the specific E2 partner(s) of this uncharacterized rice RGLG protein are unknown. Furthermore, while Q6YYC5 is classified in the RGLG family (PTHR45751:SF16, RGLG4), not all RGLG family members necessarily share K63 linkage specificity -- the rice RGLG family includes members (OsRGLG5, OsRGLG6) that target substrates for 26S proteasomal degradation, which typically involves K48-linked chains. The prediction cannot be confirmed or refuted without biochemical characterization of Q6YYC5 with its cognate E2 enzyme(s).
Q6YYC5 Oryza sativa subsp. japonica GO:0061630
ubiquitin protein ligase activity
GO_MF CNN 2
ProtNLM2 predicted GO:0061630 (ubiquitin protein ligase activity) as a more specific refinement of the existing GOA annotation GO:0004842 (ubiquitin-protein transferase activity). This is correct: GO:0004842 encompasses both E2 conjugating enzymes and E3 ligases, while GO:0061630 specifically denotes E3 ligase activity. Q6YYC5 contains a canonical C-terminal RING finger zinc-binding domain (IPR001841, PROSITE PS50089, residues 356-389) which is the hallmark catalytic domain of RING-type E3 ubiquitin ligases. PANTHER classifies it as PTHR45751:SF16 (E3 ubiquitin-protein ligase RGLG4), and related rice RGLG proteins (OsRGLG5, OsRGLG6) as well as Arabidopsis orthologs (RGLG1/RGLG2) have confirmed E3 ligase activity. However, this refinement is assessed as CNN (correct but not novel) rather than COR because the E3 vs E2 distinction is already apparent from domain architecture alone -- the RING domain is universally recognized as an E3 ligase signature -- and the main ai-review independently proposed the identical MODIFY action from GO:0004842 to GO:0061630 based on this same reasoning. The prediction confirms existing domain-based inference rather than providing genuinely novel functional insight.
A0A2R9CAF4 Pan paniscus GO:0008509
monoatomic anion transmembrane transporter activity
GO_MF LSP 2
GO:0008509 (monoatomic anion transmembrane transporter activity) is already present as an existing GOA annotation for this protein. As a broad parent term, it correctly captures that SLC26A11 transports monoatomic anions (sulfate, chloride, and others), but it is far less informative than the specific molecular functions already annotated. The protein's core activity is secondary active sulfate transmembrane transporter activity (GO:0008271), demonstrated by reconstitution studies showing proton-coupled sulfate/chloride exchange with a KM of approximately 40 uM for sulfate. It also possesses a distinct chloride channel activity (GO:0005254) confirmed by electrophysiology. The task description notes an EXACT match to GO:0140900 (chloride:bicarbonate antiporter activity), but the curated review flags GO:0140900 for modification because recent biochemical reconstitution data show bicarbonate has minimal competition for the SLC26A11 substrate binding site -- the actual exchange mechanism is proton-coupled sulfate/chloride antiport, not chloride/bicarbonate exchange. ProtNLM2 failed to predict the more specific and biologically accurate transport activities that define this protein's function.
A0A2R9CAF4 Pan paniscus GO:0016020
membrane
GO_CC LSP 2 FREQUENCY_BIAS
GO:0016020 (membrane) is trivially correct for SLC26A11, which has 10-14 transmembrane helices and is unambiguously an integral membrane protein. However, this prediction is uninformative and already marked as MARK_AS_OVER_ANNOTATED in the curated review. The biologically meaningful localization is the lysosomal membrane (GO:0005765), where SLC26A11 functions as the primary sulfate exporter using the lysosomal proton gradient. Confocal microscopy with Lamp1 co-staining shows Manders coefficients of 0.45-0.50 for lysosomal overlap across multiple mammalian cell types (HEK293T, COS1, CHO, renal intercalated cells). By contrast, overlap with ER markers is minimal (0.09-0.12). The generic membrane term fails to distinguish this specific lysosomal residence from any other membrane protein. This type of over-generic cellular component prediction is characteristic of frequency bias, as membrane is among the most commonly assigned GO CC terms in training data.
A0BFB4 Paramecium tetraurelia GO:0006468
protein phosphorylation
GO_BP CNN 2
ProtNLM2 predicted GO:0006468 (protein phosphorylation), the biological process of covalent addition of phosphate groups to amino acid residues in proteins. A0BFB4 contains a canonical protein kinase catalytic domain (Pfam PF00069, residues 96-348) with a conserved serine/threonine kinase active site (IPR008271) and ATP-binding site (IPR017441), and is already annotated in GOA with GO:0004674 (protein serine/threonine kinase activity) via both IBA and IEA evidence. Protein phosphorylation is the process directly enabled by protein kinase activity -- a kinase that catalyzes ATP-dependent transfer of phosphate to Ser/Thr residues is by definition involved in protein phosphorylation. The prediction is therefore correct but not novel: the biological process is logically entailed by the existing molecular function annotation. Additionally, GOA already includes GO:0005524 (ATP binding), which further corroborates the catalytic competence of this kinase. While the specific substrates and pathway context of A0BFB4 remain unknown (it belongs to a massively expanded kinome of 2606 kinases in P. tetraurelia), the general involvement in protein phosphorylation is unambiguous from its domain architecture.
B7FXQ8 Phaeodactylum tricornutum GO:0009651
response to salt stress
GO_BP UNC 1
While sHSPs can be induced by multiple abiotic stresses beyond heat, there is no direct evidence that HSP20A in P. tricornutum is specifically involved in the response to salt stress. The deep research report focuses exclusively on thermal stress roles for this protein family in diatoms, with no mention of salinity-induced expression. P. tricornutum is a marine diatom and does experience osmotic stress, but the expanded HSF regulatory network described in this organism (Huang et al. 2025, Lin et al. 2024) is characterized in the context of thermal tolerance, not salt acclimation. Some plant sHSPs are salt-inducible, but extrapolating this to a diatom HSP20 without organism-specific evidence is speculative. ProtNLM2 may have learned a general association between stress-response proteins and salt stress from plant training data, but this remains unvalidated for HSP20A. Cannot be confirmed or refuted without transcriptomic or genetic evidence under salinity stress conditions.
B7FXQ8 Phaeodactylum tricornutum GO:0051259
protein complex oligomerization
GO_BP COR 2
This is a correct novel prediction. Oligomerization is a defining and functionally essential feature of the HSP20/sHSP family. Small heat shock proteins assemble into dynamic oligomeric structures ranging from dimers to large complexes of 24 or more subunits, with dimers serving as building blocks that associate through their N-terminal and C-terminal regions to form higher-order assemblies (Sprague-Piercy et al. 2021, Gu et al. 2023). The oligomeric state is functionally significant: dimers often represent the active chaperone form, while larger oligomers may serve as inactive storage pools. HSP20A contains the conserved alpha-crystallin domain (residues 47-155) that mediates dimerization, and its variable terminal extensions regulate higher-order oligomerization. Since B7FXQ8 has no existing GOA annotations, this represents a genuine novel prediction consistent with the well-established structural biology of the sHSP family.
B7FXQ8 Phaeodactylum tricornutum GO:0006457
protein folding
GO_BP LSP 2
This prediction is broadly correct but less precise than the actual biological role. sHSPs like HSP20A do not actively fold proteins; they function as holdase chaperones that prevent irreversible aggregation of partially unfolded or misfolded proteins and maintain them in a folding-competent state (Mitra et al. 2022, Albinhassan et al. 2025). The actual refolding is carried out by ATP-dependent chaperones (HSP70, HSP100) to which sHSPs hand off their client proteins. A more precise annotation would be GO:0061077 (chaperone-mediated protein folding) or terms related to the prevention of protein aggregation, such as GO:0051085 (chaperone cofactor-dependent protein refolding) or the broader protein quality control pathway. GO:0006457 (protein folding) is a general term that encompasses de novo folding, which is not the primary role of sHSPs. The prediction captures the correct functional domain (proteostasis) but at insufficient specificity.
B7FXQ8 Phaeodactylum tricornutum GO:0009408
response to heat
GO_BP COR 2
This is a correct novel prediction and arguably the most biologically well-supported of all six predictions. HSP20A is named as a heat shock protein and belongs to the sHSP/HSP20 family, whose defining biological role is the cellular response to heat stress. In P. tricornutum specifically, recent research has established that the organism possesses an exceptionally expanded heat shock transcription factor (HSF) repertoire (69 HSF genes, 44.2% of all transcription factors) that controls thermal tolerance programs, with HSP proteins as downstream effectors (Huang et al. 2025, Lin et al. 2024). In the related marine dinoflagellate Scrippsiella trochoidea, HSP20 transcripts are strongly upregulated under heat stress (Deng et al. 2020). The UniProt entry for B7FXQ8 carries a keyword annotation for stress response, and the ARBA automated rule (ARBA00023016) supports this assignment. Since P. tricornutum thrives across temperatures from 5 to 28 degrees C in diverse marine environments, heat shock proteins are critical for its thermal adaptability. This is a high-confidence correct prediction.
B7FXQ8 Phaeodactylum tricornutum GO:0042542
response to hydrogen peroxide
GO_BP UNC 1
There is no direct evidence that HSP20A in P. tricornutum is involved in the response to hydrogen peroxide. While some sHSPs in other organisms (particularly plant and mammalian systems) have been shown to confer protection against oxidative stress, and oxidative stress and heat stress response pathways can overlap, the deep research report for this protein does not mention any oxidative stress role. The alpha-crystallin domain in vertebrate lens crystallins does protect against oxidative damage, but this function has not been demonstrated for diatom HSP20 proteins. ProtNLM2 may have learned an association between sHSPs and oxidative stress responses from well-characterized plant or mammalian training examples, but without P. tricornutum-specific evidence (e.g., transcriptomic upregulation under H2O2 treatment or genetic perturbation data), this prediction cannot be validated or refuted.
B7FXQ8 Phaeodactylum tricornutum GO:0051082
unfolded protein binding
GO_MF COR 2
This is a correct novel prediction of the core molecular function of sHSPs. The alpha-crystallin domain of HSP20 family proteins mediates direct binding to partially unfolded, misfolded, or aggregation-prone client proteins through recognition of exposed hydrophobic surface regions that are normally buried in properly folded structures (Sprague-Piercy et al. 2021, Mitra et al. 2022, Gu et al. 2023). This holdase activity -- binding non-native proteins to prevent their irreversible aggregation -- is the primary molecular function of sHSPs and is mediated by the conserved alpha-crystallin domain present in HSP20A (residues 47-155, InterPro IPR002068, Pfam PF00011). The CDD annotation ACD_sHsps-like (cd06464) on this protein further confirms the presence of a functional alpha-crystallin domain competent for substrate binding. sHSPs exhibit broad, promiscuous substrate specificity for non-native proteins rather than targeting specific individual clients. Since B7FXQ8 has no existing GOA annotations, this is a genuinely novel and well-supported prediction.
G1TUN6
UBE2L6
Oryctolagus cuniculus GO:0016740
transferase activity
GO_MF LSP 2
ProtNLM2 predicted GO:0016740 (transferase activity) for rabbit UBE2L6, a prediction that is biologically correct but substantially less precise than the existing curated annotations. UBE2L6 is an E2 ubiquitin-conjugating enzyme (EC 2.3.2.23) whose primary physiological role is as the dedicated E2 for ISG15 conjugation (ISGylation), an interferon-stimulated post-translational modification system central to innate antiviral defense. The E2 reaction mechanism is a transthioesterification -- the activated ubiquitin-like modifier (ISG15 or ubiquitin) is transferred from the E1 thioester to the conserved active-site cysteine (Cys86) of the E2 via a new thioester bond -- placing UBE2L6 squarely in the transferase catalytic class. However, GO:0016740 (transferase activity) is a very high-level term that encompasses all enzymes transferring any functional group; the GOA already contains the far more informative descendant terms GO:0019787 (ubiquitin-like protein transferase activity) and GO:0042296 (ISG15 transferase activity), both transferred via Ensembl Compara from the experimentally characterized human ortholog (O14933). GO:0016740 is an ancestor of both these terms in the GO molecular function hierarchy, so the ProtNLM2 prediction adds no novel information and is less informative than what existing automated methods have already assigned. This is scored LSP (less precise than existing annotation) rather than CNN because the prediction does not match an existing annotation at the same granularity -- it is a strict generalization that would never be annotated alongside the more specific terms under standard GO annotation practice.
C6T1A2 Glycine max GO:0009788
negative regulation of abscisic acid-activated signaling pathway
GO_BP UNC 1
ProtNLM2 predicted GO:0009788 (negative regulation of abscisic acid-activated signaling pathway) for C6T1A2, a soybean C2H2-type zinc finger protein with no curated GO annotations in GOA. This prediction is biologically plausible but uncertain. The most likely basis for this prediction is sequence similarity to Arabidopsis ZFP7 (Q39266), which shares the same IPR053266 (Zinc_finger_protein_7) family membership and single-C2H2 domain architecture, and has been experimentally shown to negatively regulate ABA-activated signaling during seed germination. However, several factors limit confidence in this functional transfer: (1) the C2H2 zinc finger superfamily is one of the largest transcription factor families in plants, with members participating in diverse biological processes including cold stress, flower development, trichome initiation, and photomorphogenesis -- sharing a C2H2 domain does not predict specific pathway involvement; (2) no gene-specific experimental studies exist for C6T1A2 / Glyma17g18110.1, and the protein was only identified as part of a soybean transcription factor ORFeome cloning effort (PMID:26268547) without individual functional characterization; (3) while soybean does possess ABA signaling pathways involved in drought responses, the specific negative regulatory role of ZFP7-family members may not be conserved across the ~90 million year divergence between Arabidopsis and Glycine max, given extensive C2H2 gene family expansion and subfunctionalization in legumes. The prediction cannot be confirmed or refuted without experimental evidence such as ABA-responsive expression profiling or overexpression/knockout phenotyping in soybean.
C6T1A2 Glycine max GO:0005634
nucleus
GO_CC CNN 2
ProtNLM2 predicted GO:0005634 (nucleus) for C6T1A2, a soybean C2H2-type zinc finger transcription factor. This prediction is biologically correct: C2H2-type zinc finger transcription factors are canonical nuclear proteins that must localize to the nucleus to bind DNA and regulate transcription. Numerous soybean C2H2 zinc finger proteins have been experimentally confirmed as nuclear-localized (e.g., GmZFP3, GmZF1), and nuclear localization is essentially a defining characteristic of the functional class. However, this is assessed as CNN (correct but not novel) rather than COR because the prediction provides no functional insight beyond what is already trivially derivable from the protein's domain architecture and family classification. The InterPro annotation (IPR053266, Zinc_finger_protein_7; IPR036236, Znf_C2H2_sf) and PANTHER classification (PTHR47593, ZINC FINGER PROTEIN 4-LIKE) both implicitly predict nuclear localization. The ai-review independently proposed this same annotation as a NEW ISS-level annotation based on domain architecture, confirming that the ProtNLM2 prediction is redundant with standard domain-based inference. While formally correct, a model that predicts nucleus for a zinc finger transcription factor demonstrates no discriminative power beyond family-level annotation transfer.
Q9KZ33 Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145) GO:0006352
DNA-templated transcription initiation
GO_BP LSP 2
ProtNLM2 predicted GO:0006352 (DNA-templated transcription initiation) for Q9KZ33/SCO7099, a predicted ECF sigma factor in S. coelicolor. While this prediction is biologically sound -- sigma factors are essential components of the bacterial transcription initiation complex, binding the RNA polymerase core enzyme and directing it to specific promoter sequences -- it is less precise than the existing GOA annotation GO:2000142 (regulation of DNA-templated transcription initiation, IEA via GO_REF:0000108). The distinction matters: sigma factors do not perform the catalytic step of transcription initiation (phosphodiester bond formation by the beta/beta-prime subunits of RNAP core); rather, they regulate which promoters are recognized and thus which genes undergo transcription initiation. GO:2000142 correctly captures this regulatory role -- the sigma factor modulates the specificity of the initiation event rather than being the initiation machinery per se. Additionally, Q9KZ33 carries a second GOA annotation for GO:0016987 (sigma factor activity, IBA via PANTHER PTN001249270), which is the molecular function from which GO:2000142 is logically derived. The predicted GO:0006352 is the process being regulated, not the regulatory function itself, making it a less precise annotation for an ECF sigma factor. The prediction is concordant with the known biology (sigma factors are intimately involved in transcription initiation) but adds no new information beyond what the existing, more precise GOA annotations already convey.
Q9L243
SCO2678
Streptomyces coelicolor GO:0008253
5'-nucleotidase activity
GO_MF COR 2
ProtNLM2 predicted GO:0008253 (5'-nucleotidase activity), the hydrolysis of a 5'-ribonucleotide or 5'-deoxyribonucleotide to a ribonucleoside or deoxyribonucleoside and orthophosphate. This prediction is well-supported by multiple independent lines of evidence. SCO2678 contains a HAD_SAK_2 domain (PF18143), placing it in the haloacid dehalogenase (HAD) superfamily, which catalyzes phosphate ester hydrolysis via a conserved nucleophilic aspartate mechanism. The protein is classified in eggNOG COG1877 (5'-nucleotidase/2',3'-cyclic phosphodiesterase and related esterases), which directly supports assignment of 5'-nucleotidase activity within the broader HAD phosphatase family. A characterized S. coelicolor homolog, SCO4152, is a PhoP-regulated extracellular 5'-nucleotidase involved in phosphate scavenging, providing direct functional precedent in this organism. S. coelicolor lacks organic phosphate transporters (uhp-type systems), necessitating extracellular dephosphorylation of nucleotides before phosphate uptake via PstSCAB or PitH transporters, which provides strong biological rationale for secreted nucleotidase activity. The UniProt entry designates SCO2678 as a secreted protein, consistent with an extracellular phosphatase role. GOA contains no curated annotations for this protein, so this prediction is genuinely novel. The independent AI review of this gene also proposed GO:0008253 as a new annotation based on the same convergent evidence. While the exact substrate specificity of SCO2678 has not been experimentally determined and HAD superfamily members can exhibit substrate promiscuity across nucleotides, sugar phosphates, and other phosphomonoesters, the COG1877 classification specifically favors 5'-nucleotidase over other HAD activities.
Q9L243
SCO2678
Streptomyces coelicolor GO:0009264
deoxyribonucleotide catabolic process
GO_BP UNC 1
ProtNLM2 predicted GO:0009264 (deoxyribonucleotide catabolic process), the chemical reactions resulting in the breakdown of deoxyribonucleotides. This prediction is plausible but insufficiently supported. If SCO2678 indeed possesses 5'-nucleotidase activity (as predicted above), hydrolysis of deoxyribonucleotides would fall within the scope of that activity, and participation in deoxyribonucleotide catabolism would logically follow. However, the prediction is problematic for two reasons. First, it is overly specific: no available evidence distinguishes deoxyribonucleotide substrates from ribonucleotide substrates for this enzyme. HAD superfamily members, including characterized 5'-nucleotidases, typically hydrolyze both ribo- and deoxyribonucleotides, and the deep research on SCO2678 consistently references broad substrate classes (nucleotides, sugar phosphates, glycerophosphodiesters) without singling out deoxyribonucleotides. Second, the biological context of S. coelicolor phosphate scavenging does not specifically implicate deoxyribonucleotide catabolism over general nucleotide catabolism -- the organism's need is for inorganic phosphate release from whatever organophosphates are available in the soil environment. A more appropriate biological process annotation would be GO:0006796 (phosphate-containing compound metabolic process) or GO:0009166 (nucleotide catabolic process), which are agnostic to the deoxy/ribo distinction. Without experimental substrate profiling demonstrating a preference for deoxyribonucleotides, this prediction cannot be confirmed or refuted.
A0A674PKV4
gas7a
Takifugu rubripes GO:0005737
cytoplasm
GO_CC LSP 2 FREQUENCY_BIAS
GO:0005737 (cytoplasm) is already present as an IEA annotation in GOA for this protein, transferred via TreeGrafter from the PANTHER GAS7 subfamily (PTHR23065:SF57). While technically correct -- GAS7a is a cytoplasmic protein that peripherally associates with membranes via its F-BAR domain -- cytoplasm is a highly generic cellular component term that conveys almost no functional information. The curated review appropriately marks this annotation as KEEP_AS_NON_CORE because the biologically meaningful localizations for gas7a are the plasma membrane (GO:0005886) and clathrin-coated pit (GO:0005905), where the crescent-shaped F-BAR dimer (residues 121-381, CDD: cd07649 F-BAR_GAS7) senses and induces membrane curvature during the invagination step of clathrin-mediated endocytosis. ProtNLM2 failed to predict any of the six more specific and informative annotations already in GOA, including the clathrin-coated pit localization, the clathrin-dependent endocytosis process, and the neuron projection morphogenesis role that is one of the best-characterized functions of the mammalian GAS7 family. This pattern of predicting only a broad parent term while missing all specific functional annotations is characteristic of frequency bias, as cytoplasm is among the most commonly assigned GO CC terms in training data.
A0A1S3Y076 Nicotiana tabacum GO:0008033
tRNA processing
GO_BP LSP 2
ProtNLM2 predicted GO:0008033 (tRNA processing) for this PRORP enzyme. The prediction is biologically correct: A0A1S3Y076 is a proteinaceous RNase P that catalyzes endonucleolytic cleavage of 5-prime leader sequences from precursor tRNAs (EC 3.1.26.5), which is indeed a form of tRNA processing. However, the prediction is less precise than the existing GOA annotation GO:0001682 (tRNA 5'-leader removal), which is a direct child of GO:0008033 and specifically names the exact processing step carried out by RNase P. The existing IBA annotation for GO:0001682 was inferred from experimentally characterized Arabidopsis PRORP orthologs (PRORP1/AT2G16650, PRORP2/AT2G32230, PRORP3/AT4G21900) and is well supported by the conserved PPR + NYN metallonuclease domain architecture of this protein. Classified as LSP rather than CNN because the model failed to resolve the specific tRNA processing step despite the clear domain signature pointing to RNase P cleavage activity.
A2FPI7 Trichomonas vaginalis GO:0003677
DNA binding
GO_MF COR 2
ProtNLM2 predicted GO:0003677 (DNA binding) for A2FPI7, a 129-amino-acid protein with no existing GOA annotations (match category NOT_IN_GOA). This prediction is assessed as correct and novel based on strong domain-level evidence. A2FPI7 contains a single KilA-N domain (PF04383/IPR017880, residues 19-124, identified by PROSITE PS51301) that spans nearly the entire protein. The KilA-N domain belongs to the KilA-N/APSES helix-turn-helix superfamily (InterPro IPR018004), which has been experimentally validated as a DNA-binding fold in two distinct biological contexts: (1) bacteriophage regulatory proteins, where KilA-N was originally characterized as mediating DNA binding for transcriptional control, and (2) fungal APSES transcription factors, whose DNA-binding domain is structurally homologous to KilA-N and has been shown to bind DNA sequence-specifically. UniProt has assigned the recommended name "KilA-N domain-containing protein" based on this domain (ECO:0000259). The T. vaginalis genome (~160 Mb) is known to harbor numerous laterally transferred genes of viral and bacterial origin, consistent with the presence of a KilA-N domain protein of likely viral ancestry. Given that DNA binding is the defining functional property of the KilA-N/APSES HTH superfamily, and the domain occupies the vast majority of this small protein (106 of 129 residues), ProtNLM2 has correctly identified the most parsimonious molecular function. No experimental data exist for this PE4-level protein, but the domain-based evidence is unambiguous.
A0A3B6GK97 Triticum aestivum GO:0016298
lipase activity
GO_MF LSP 2
ProtNLM2 predicted GO:0016298 (lipase activity) for this patatin/PNPLA domain-containing protein. The prediction is biologically sound: the patatin family comprises non-specific lipid acyl hydrolases that cleave acyl-ester bonds of glycerolipids using a Ser-Asp catalytic dyad (PMID:12779324), and lipase activity is a correct descriptor at the family level. However, this protein already carries more specific IBA (phylogenetic) annotations from GO_Central: GO:0047372 monoacylglycerol lipase activity and GO:0004620 glycerophospholipase activity (GO_REF:0000033). These were assigned by manual phylogenetic inference and provide substrate-resolved specificity that the ProtNLM2 prediction lacks. Additionally, the ai-review proposed GO:0052689 (carboxylic ester hydrolase activity) as an informative unifying parent. Therefore, while correct, GO:0016298 is less precise than the existing GO:0047372 and adds no information beyond what is already captured.
A0A3B6GK97 Triticum aestivum GO:0016042
lipid catabolic process
GO_BP CNN 2
ProtNLM2 predicted GO:0016042 (lipid catabolic process), which is a child of the existing GOA annotation GO:0006629 (lipid metabolic process, IEA via InterPro2GO from the PNPLA domain IPR002641). While the patatin family does participate in lipid catabolism (e.g., lipid mobilization during seed germination and storage-oil breakdown), the ai-review deliberately retained the broader GO:0006629 rather than sharpening to lipid catabolic process, because patatin-family acyl hydrolases also act in membrane phospholipid remodeling and lipid-based defense signaling - roles that are not purely catabolic. The deep-research report supports diverse patatin functions including membrane phospholipid turnover and remodeling of lipid-droplet surfaces. Furthermore, the pPLAII subfamily placement (from bioinformatics analysis) associates this protein with the defense/stress response clade rather than a dedicated degradative role. The prediction is therefore correct in that lipid catabolism is one component of patatin function, but it is not novel relative to the existing broader annotation and arguably narrows the functional scope beyond what is justified by available evidence for this uncharacterized protein.
A0A3B6NKR6 Triticum aestivum GO:0016310
phosphorylation
GO_BP LSP 2
GO:0016310 (phosphorylation) is a correct but uninformative prediction for this protein. A0A3B6NKR6 contains GHMP kinase N-terminal (IPR006204) and C-terminal (IPR013750) domains together with a glucuronokinase-like domain (IPR053034), and its closest characterized ortholog is Arabidopsis GLAK2 (Q9LY82, EC 2.7.1.43). Glucuronokinases catalyze the ATP-dependent phosphorylation of D-glucuronic acid to D-glucuronate-1-phosphate, so phosphorylation is technically correct at the broadest level. However, the prediction is far too generic: GO:0016310 is a high-level biological process term that encompasses all kinase reactions and fails to capture the specific pathway context. The informative annotation would be involvement in UDP-glucuronate biosynthetic process (GO:0006065) via the myo-inositol oxidation pathway, with glucuronokinase activity (GO:0047940) as the molecular function. Additionally, GO:0016310 does not overlap with the existing GOA annotations, which include ATP binding (IEA) and cytosol localization (IBA), though it is semantically consistent with the kinase function implied by the ATP binding annotation. Scored as LSP because the prediction is correct at a coarse granularity but does not add biological insight beyond what is already implied by the GHMP kinase domain assignment.
A0A3B6RKV1 Triticum aestivum GO:0010099
regulation of photomorphogenesis
GO_BP UNC 1
The closest characterized ortholog in Arabidopsis (AT5G06550, JMJ22/PKDM7D) participates in photomorphogenesis through histone demethylation at target gene loci, making this prediction biologically plausible at the ortholog level. However, no experimental evidence exists for photomorphogenesis regulation by any wheat JmjC family member. Photomorphogenesis regulatory networks differ between monocots and dicots, and wheat KDM5/JARID1 members show dynamic expression under drought stress rather than light-responsive regulation in published studies (Wang et al. 2022). The protein has PE level 3 (inferred from homology) with no direct functional characterization. This prediction likely reflects training data from the well-characterized Arabidopsis JMJ22 ortholog rather than wheat-specific evidence.
A0A3B6RKV1 Triticum aestivum GO:0010476
gibberellin mediated signaling pathway
GO_BP UNC 1
Arabidopsis JMJ22/PKDM7D demethylates histones at GA biosynthesis gene loci, linking it to gibberellin-mediated signaling. Wheat JmjC gene promoters contain GA-responsive cis-elements (Wang et al. 2022; Ma et al. 2022), providing indirect support for GA pathway involvement. However, the presence of hormone-responsive promoter elements is common across many gene families and does not establish direct involvement in GA signaling. No experimental evidence demonstrates that this specific wheat protein regulates GA biosynthesis or signaling. The prediction is plausible through orthology to JMJ22 but remains unverifiable without wheat-specific functional data.
A0A3B6RKV1 Triticum aestivum GO:0040029
epigenetic regulation of gene expression
GO_BP COR 2
This is a correct novel prediction. A0A3B6RKV1 contains a JmjC catalytic domain (residues 285-445) and belongs to the KDM5/JARID1 histone demethylase subfamily, which catalyzes Fe(II)- and 2-oxoglutarate-dependent oxidative removal of methyl groups from histone H3K4me1/2/3 marks. Histone demethylation is by definition a mechanism of epigenetic gene expression regulation. Phylogenetic analysis across 21 plant species confirms that KDM5/JARID subfamily members function as H3K4 demethylases (Ma et al. 2022), and 18 of 24 wheat JmjC family members localize to the nucleus consistent with chromatin-associated function (Wang et al. 2022). The ai-review independently lists this term as a core function. This GO term was not present in the existing GOA annotations, making it a genuinely novel and correct prediction that follows directly from the established enzymatic activity of the protein family.
A0A3B6RKV1 Triticum aestivum GO:0010114
response to red light
GO_BP UNC 1
Red light response is mechanistically linked to photomorphogenesis and phytochrome signaling, and some Arabidopsis JmjC proteins participate in light-mediated developmental pathways. However, this is a more specific prediction than regulation of photomorphogenesis, and the evidence supporting it is weaker. No published study demonstrates response to red light for any wheat KDM5/JARID1 member, nor has the Arabidopsis ortholog JMJ22 been specifically characterized as red-light responsive (its photomorphogenesis role involves broader light signaling). The deep research report for A0A3B6RKV1 does not mention red light or phytochrome signaling among the biological processes of wheat JmjC proteins. This prediction may reflect frequency bias from ProtNLM2 associating photomorphogenesis-related terms as a cluster for JmjC-like sequences in the training data.
A0A3B6RKV1 Triticum aestivum GO:0010030
positive regulation of seed germination
GO_BP UNC 1
Arabidopsis JMJ22/PKDM7D participates in seed germination through histone arginine demethylation at GA biosynthesis gene loci, providing ortholog-based support for this prediction. GA signaling is a key regulator of seed germination across angiosperms, and the mechanistic link between histone demethylation and GA-dependent germination is well-established in Arabidopsis. However, the specific polarity of regulation (positive vs. negative) has not been established for this wheat protein. KDM5/JARID1 demethylases remove activating H3K4 marks, which typically represses transcription, so the prediction of positive regulation of germination requires that the demethylase targets negative regulators of germination (an indirect mechanism). Without wheat-specific experimental data, both the involvement in germination and the direction of regulation remain unverified.
F6LAX4 Triticum aestivum GO:0046982
protein heterodimerization activity
GO_MF NPI 0 FREQUENCY_BIAS
GO:0046982 (protein heterodimerization activity) is a generic protein-protein interaction term that technically applies to any protein forming a heterodimer. While the PP2A A subunit does form an A-C heterodimer with the catalytic C subunit, this term is uninformative and misleading: the A subunit's molecular function is not generic heterodimerization but rather a highly specific scaffolding/regulatory role captured by GO:0019888 (protein phosphatase regulator activity), which is already annotated with IBA evidence. The A subunit's HEAT-repeat solenoid architecture provides a platform for both the catalytic C subunit and a regulatory B subunit, forming a heterotrimer rather than a simple heterodimer. Annotating this protein with GO:0046982 would obscure its specific phosphatase-regulatory scaffold function and replace it with a term that applies to thousands of unrelated proteins. This prediction likely reflects frequency bias, as protein heterodimerization activity is among the most commonly assigned MF terms in GO training data and is frequently predicted for any protein with protein-protein interaction domains such as HEAT repeats.
F6LAX4 Triticum aestivum GO:0043025
neuronal cell body
GO_CC NPI 0 PATHWAY_CONTEXT_IGNORED
GO:0043025 (neuronal cell body) is an animal-specific cellular component term that refers to the soma of a neuron. Triticum aestivum is a monocotyledonous plant (Poaceae) that entirely lacks neurons, a nervous system, and any neuronal cell types. This prediction is biologically impossible for a plant protein. The error likely arises because mammalian PP2A orthologs (PPP2R1A/PPP2R1B) are abundantly expressed in neurons and are annotated to neuronal compartments in human and mouse GOA; ProtNLM2 appears to have transferred these animal-specific localization annotations across kingdoms without regard to the organism's biology. The PP2A A subunit in wheat functions in cytoplasm and nucleus (as annotated via IBA and ARBA), consistent with its role in plant hormone signaling (auxin transport, brassinosteroid signaling) rather than any neural function.
F6LAX4 Triticum aestivum GO:0007059
chromosome segregation
GO_BP NPI 0 FREQUENCY_BIAS
GO:0007059 (chromosome segregation) is predicted based on the known role of animal PP2A holoenzymes at kinetochores and centromeres during mitosis, where PP2A-B56 complexes dephosphorylate cohesin protectors (e.g., shugoshin-bound substrates) to regulate sister chromatid cohesion and chromosome segregation. However, this role is mediated by specific B56 regulatory subunits that recruit PP2A to centromeric substrates -- the A subunit itself is the generic scaffold present in all PP2A holoenzymes regardless of substrate. More critically, while PP2A catalytic activity is conserved in plants, the specific centromeric/kinetochore PP2A-B56 chromosome segregation pathway characterized in animal cells has not been demonstrated for plant PP2A A subunits. The existing reviewed annotations for F6LAX4 deliberately exclude chromosome segregation terms: the AI review notes that GO_Central correctly did NOT propagate the lineage-specific GO:0051225 (spindle assembly) and GO:0051754 (meiotic sister chromatid cohesion) annotations from human PPP2R1A to this plant ortholog. This prediction represents frequency-biased cross-kingdom transfer of an animal-specific PP2A role.
F6LAX4 Triticum aestivum GO:0043005
neuron projection
GO_CC NPI 0 PATHWAY_CONTEXT_IGNORED
GO:0043005 (neuron projection) refers to neurites (axons, dendrites) of neuronal cells. Like GO:0043025 (neuronal cell body), this is an animal-specific cellular component that is biologically impossible for a plant protein. Triticum aestivum has no neurons or neuron projections. The prediction stems from the same cross-kingdom mis-transfer as GO:0043025: mammalian PP2A is localized to neuronal projections where it regulates synaptic signaling and cytoskeletal dynamics, but these annotations are entirely irrelevant to a wheat scaffolding subunit. The wheat PP2A A subunit localizes to cytoplasm and nucleus, where it assembles PP2A holoenzymes for plant-specific signaling pathways (auxin, brassinosteroid, stress responses).
F6LAX4 Triticum aestivum GO:0000775
chromosome, centromeric region
GO_CC NPI 0 FREQUENCY_BIAS
GO:0000775 (chromosome, centromeric region) is predicted based on the well-characterized role of animal PP2A-B56 holoenzymes at centromeres, where they dephosphorylate cohesin protectors (shugoshin/Sgo1) to maintain sister chromatid cohesion until anaphase onset. In animal cells, the PP2A A subunit is recruited to centromeres as part of these specific holoenzymes. However, centromeric localization of PP2A in plants is not established. The A subunit is a generic scaffold present in all PP2A complexes, and its localization is determined by the B regulatory subunit it assembles with. No evidence places the wheat A subunit specifically at centromeres; its characterized plant roles center on cytoplasmic/nuclear signaling (hormone transport, stress responses). This prediction represents frequency-biased transfer from animal PP2A annotations where centromeric localization is well-documented but lineage-specific.
F6LAX4 Triticum aestivum GO:1990405
protein antigen binding
GO_MF NPI 0 PATHWAY_CONTEXT_IGNORED
GO:1990405 (protein antigen binding) is an immune-system-specific molecular function term describing the binding of protein antigens by antibodies, T-cell receptors, or antigen-presenting molecules (MHC). Plants lack an adaptive immune system, do not produce antibodies or T-cell receptors, and have no MHC-based antigen presentation machinery. The PP2A A subunit is a HEAT-repeat scaffolding protein with no structural or functional relationship to antigen-binding proteins. This prediction is biologically nonsensical for any plant protein, let alone a PP2A scaffold. The HEAT-repeat solenoid domain found in the PP2A A subunit mediates specific protein-protein interactions within the PP2A holoenzyme (binding the C and B subunits), not antigen recognition. This represents a severe cross-kingdom pathway context error, likely arising from spurious sequence or term co-occurrence patterns in ProtNLM2 training data.
Q8P365
btuE
Xanthomonas campestris pv. campestris GO:0006979
response to oxidative stress
GO_BP CNN 2 TRAINING_DATA_CONTAMINATION
GO:0006979 (response to oxidative stress) is already present as an existing IEA annotation (via InterPro2GO, GO_REF:0000002) for this protein, making this prediction correct but not novel. Furthermore, the protein also carries a phylogenetically inferred IBA annotation to the more specific child term GO:0034599 (cellular response to oxidative stress), which subsumes GO:0006979 and provides greater biological precision. BtuE (XCC4213) is classified in the glutathione peroxidase family by multiple independent lines of evidence: the GSHPx Pfam domain (PF00255), the glutathione peroxidase InterPro family (IPR000889), the GPX active-site motif (IPR029759) with a conserved catalytic residue at position 37, and the thioredoxin-like structural fold (IPR036249). Bacterial glutathione peroxidases reduce hydrogen peroxide and organic hydroperoxides using thiol-based reductants, directly protecting the cell from reactive oxygen species -- the biochemical basis of the oxidative stress response. The curated review accepted GO:0034599 as representing a core function and marked GO:0006979 as KEEP_AS_NON_CORE precisely because the broader term is redundant with the more specific IBA annotation. ProtNLM2 recapitulated the less informative of the two existing process annotations without adding any new biological insight. The prediction also missed the molecular function (GO:0004602, glutathione peroxidase activity) and the process term GO:0098869 (cellular oxidant detoxification) that more precisely capture the enzymatic mechanism by which BtuE contributes to oxidative stress defense.
D3VIU4 Xenorhabdus nematophila GO:0015276
ligand-gated monoatomic ion channel activity
GO_MF NPI 0 FREQUENCY_BIAS
GO:0015276 (ligand-gated monoatomic ion channel activity) is unequivocally wrong for D3VIU4. This protein is FliY, the periplasmic substrate-binding component of the FliY-YecSC (TcyJLN) ABC transporter for L-cystine import. It belongs to bacterial solute-binding protein family 3 (PF00497, IPR001638, CDD cd13711 PBP2_Ngo0372_TcyA), a well-characterized family of soluble periplasmic proteins that bind amino acids or compatible solutes and deliver them to cognate ABC transporter permeases. The protein has a cleavable signal peptide (residues 1-28) for periplasmic secretion and no transmembrane helices whatsoever. Ligand-gated ion channels, by contrast, are integral membrane proteins with multiple transmembrane segments that form a selective ion- conducting pore opened by ligand binding -- a completely different protein architecture and mechanism. No domain hit (InterPro, Pfam, CDD, PANTHER, PROSITE), no eggNOG orthologous group (COG0834, which covers extracellular solute-binding proteins), and no literature on FliY orthologs in E. coli or other Enterobacterales supports any ion channel activity. The known molecular function of FliY is amino acid binding (GO:0016597) in the context of L-cystine transport (GO:0015811). This prediction appears to be a frequency- biased misassignment, possibly driven by superficial sequence features that the model incorrectly associates with channel activity rather than substrate binding.
A0A8J0SCI2 Xenopus tropicalis GO:0000978
RNA polymerase II cis-regulatory region sequence-specific DNA binding
GO_MF CNN 2
This prediction correctly identifies sequence-specific DNA binding at RNA polymerase II cis-regulatory regions, which is well-supported by the presence of 8 tandem C2H2 zinc finger domains spanning residues 37-260. Each zinc finger module makes base-specific contacts in the DNA major groove, enabling sequence-specific recognition of regulatory DNA elements. However, this is not a novel prediction: GO:0000978 is already present as an IBA annotation from PANTHER phylogenetic inference (GO_Central, GO_REF:0000033), and the protein is classified within the Krueppel C2H2-type zinc finger family where this function is expected. The prediction matches an exact existing GOA annotation (GO:0003677 DNA binding is the stated comparator but the actual existing annotation is the more specific GO:0000978 itself).
A0A8J0SCI2 Xenopus tropicalis GO:0006357
regulation of transcription by RNA polymerase II
GO_BP CNN 2
Regulation of transcription by RNA polymerase II is the expected core biological process for a Krueppel-type C2H2 zinc finger transcription factor. The protein has the canonical domain architecture (tandem C2H2 zinc fingers) and UniProt notes it "may be involved in transcriptional regulation." This prediction is correct but not novel: GO:0006357 is already present as an IBA annotation from PANTHER phylogenetic inference (GO_Central, GO_REF:0000033). The existing annotation GO:0000981 (DNA-binding TF activity, RNA pol II-specific) captures the molecular function aspect; GO:0006357 captures the biological process. Both are already in GOA.
A0A8J0SCI2 Xenopus tropicalis GO:0005634
nucleus
GO_CC CNN 2
Nuclear localization is strongly supported for a C2H2 zinc finger transcription factor whose function requires binding genomic DNA in the nucleus. The protein has 8 tandem C2H2 zinc finger domains that mediate sequence-specific DNA binding, necessitating nuclear localization. UniProt ARBA annotation independently predicts nucleus (ARBA00004123). This prediction is correct but not novel: GO:0005634 is already annotated twice in GOA -- once via IBA (PANTHER, GO_REF:0000033 with qualifier is_active_in) and once via IEA (UniProt subcellular location mapping, GO_REF:0000044 with qualifier located_in).
A0A8J0SCI2 Xenopus tropicalis GO:0001228
DNA-binding transcription activator activity, RNA polymerase II-specific
GO_MF NPI 0 PARALOG_OVERANNOTATION
This prediction incorrectly specifies transcription activator activity for a protein where no evidence supports activator versus repressor function. A0A8J0SCI2 is a pure zinc finger array protein with 8 tandem C2H2 zinc finger domains spanning nearly its entire length (residues 37-260) and a short N-terminal disordered region (residues 1-27). Critically, it lacks any known effector domains -- no KRAB repressor domain, no SCAN dimerization domain, no BTB/POZ domain, and no identifiable activation domain. Without an effector domain, the directionality of transcriptional regulation (activation vs. repression) cannot be inferred from sequence alone. The existing IBA annotation appropriately uses the parent term GO:0000981 (DNA-binding transcription factor activity, RNA polymerase II-specific), which is agnostic to activator/repressor function. As noted in the gene review, ProtNLM2 likely derived this prediction from sequence similarity to KRAB-ZNF proteins (CATH FunFam assignments include ZNF1184 and ZNF527/577, which are KRAB-containing repressors), but this protein lacks the KRAB domain entirely, making the activator/repressor specificity prediction unreliable. The error is classified as PARALOG_OVERANNOTATION because the prediction inappropriately transfers a specific functional characteristic (activator activity) from distantly related zinc finger proteins with different domain architectures.
A0A8J1IYX6 Xenopus tropicalis No predictions
F6WPT1 Xenopus tropicalis No predictions