NCGR_LOCUS10166 is an unreviewed TrEMBL entry (A0A811MX19) from Miscanthus lutarioriparius annotated as "Protein ARV". It is a 719 amino acid protein containing two completely unrelated domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23) domain and a C-terminal ARV1 sterol homeostasis domain. This combination is almost certainly a gene prediction artifact, as the same organism encodes properly-sized separate genes for both domains at adjacent loci (NCGR_LOCUS4558/A0A811MHP4 for HDH at 478 aa, and NCGR_LOCUS4557/A0A811MIX4 for ARV1 at 230 aa). The two domains have incompatible subcellular localizations: HDH is a soluble chloroplast stroma enzyme while ARV1 is an ER multi-pass membrane protein. All GO annotations are IEA-based and reflect the conflation of two separate gene products. The genome sequence is derived from preliminary WGS data.
| GO Term | Evidence | Action | Reason |
|---|---|---|---|
|
GO:0004399
histidinol dehydrogenase activity
|
IEA
GO_REF:0000120 |
MARK AS OVER ANNOTATED |
Summary: Histidinol dehydrogenase activity (EC 1.1.1.23) is correctly predicted for the N-terminal HDH domain of this protein based on HAMAP rule MF_01024. The domain contains CDD cd06572 (Histidinol_dh), Pfam PF00815, and the PROSITE active site PS00611. However, UniProt notes this entry "lacks conserved residue(s) required for the propagation of feature annotation," raising questions about catalytic competence. The same organism has a properly-sized separate HDH (A0A811MHP4, NCGR_LOCUS4558, 478 aa) that likely represents the true functional HDH gene.
Reason: While the HDH domain signature is present in the N-terminal region, this annotation is misleading because it is applied to a probable gene prediction artifact. The entry likely represents two incorrectly merged adjacent genes. The real HDH in this organism is NCGR_LOCUS4558 (A0A811MHP4). Additionally, UniProt flags missing conserved residues needed for feature propagation.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa. The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that were correctly predicted in one genomic region but incorrectly fused in another.
|
|
GO:0016491
oxidoreductase activity
|
IEA
GO_REF:0000002 |
MARK AS OVER ANNOTATED |
Summary: Oxidoreductase activity is a parent term of histidinol dehydrogenase activity and is mapped from InterPro IPR016161 (Ald_DH/histidinol_DH). This is a broad classification that applies to the N-terminal HDH domain.
Reason: This is a very general parent term that adds no specificity beyond what GO:0004399 (histidinol dehydrogenase activity) already provides. More importantly, it is applied to a probable gene prediction artifact. The annotation is technically correct for the HDH domain but redundant and misleading in this context.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation. The two domains should be treated as separate gene products.
|
|
GO:0016616
oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor
|
IEA
GO_REF:0000002 |
MARK AS OVER ANNOTATED |
Summary: This term is mapped from InterPro IPR012131 (Hstdl_DH) and correctly describes the reaction mechanism of histidinol dehydrogenase using NAD+ as acceptor. It applies to the N-terminal HDH domain.
Reason: While technically correct for the HDH domain, this is an intermediate-specificity term that is redundant with the more specific GO:0004399 (histidinol dehydrogenase activity). Applied to a probable gene prediction artifact where annotations from two separate gene products are conflated.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
All electronic annotations are technically correct for their respective domains but misleading when combined on a single entry.
|
|
GO:0046872
metal ion binding
|
IEA
GO_REF:0000002 |
MARK AS OVER ANNOTATED |
Summary: Metal ion binding is mapped from InterPro IPR001692 (Histidinol_DH_CS), reflecting the zinc cofactor requirement of histidinol dehydrogenase. The ARV1 domain also contains a cysteine-rich subdomain with a putative zinc-binding motif. Both domains likely bind metal ions but for completely different purposes.
Reason: This is a very general term. For the HDH domain, the more informative annotation would be zinc ion binding (GO:0008270) specifically as a catalytic cofactor. For the ARV1 domain, zinc binding relates to the structural zinc-binding motif in the AHD domain. The term is too vague and applied to a probable gene prediction artifact combining two unrelated proteins.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with transmembrane domains. A single protein cannot function in both compartments simultaneously.
|
|
GO:0051287
NAD binding
|
IEA
GO_REF:0000002 |
MARK AS OVER ANNOTATED |
Summary: NAD binding is mapped from InterPro IPR012131 (Hstdl_DH) and correctly reflects the cofactor requirement of histidinol dehydrogenase, which uses 2 NAD+ molecules per catalytic cycle. This applies exclusively to the N-terminal HDH domain.
Reason: While technically correct for the HDH domain, this annotation is applied to a probable gene prediction artifact. NAD binding is part of the histidinol dehydrogenase activity (GO:0004399) and somewhat redundant. The real HDH gene in this organism is NCGR_LOCUS4558.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa represents the true HDH gene.
|
|
GO:0000105
L-histidine biosynthetic process
|
IEA
GO_REF:0000120 |
MARK AS OVER ANNOTATED |
Summary: L-histidine biosynthetic process annotation derives from HAMAP rule MF_01024 and UniPathway (UPA00031, step 9/9). HDH catalyzes the final step of histidine biosynthesis. In plants, this pathway operates in the chloroplast. This annotation applies exclusively to the N-terminal HDH domain.
Reason: While the HDH domain is correctly associated with histidine biosynthesis, this annotation is on a probable gene prediction artifact. The true histidine biosynthesis HDH in this organism is NCGR_LOCUS4558 (A0A811MHP4, 478 aa). UniProt also notes missing conserved residues, questioning whether this particular copy is catalytically active even if the gene model were correct.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.
|
|
GO:0006665
sphingolipid metabolic process
|
IEA
GO_REF:0000104 |
MARK AS OVER ANNOTATED |
Summary: Sphingolipid metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins regulate sphingolipid metabolism in yeast and plants. Arabidopsis ARV isoforms (AtArv1p, AtArv2p) are ER-localized and modulate both sterol and sphingolipid homeostasis [PMID:16725371]. This annotation applies to the C-terminal ARV1 domain.
Reason: While this annotation is correct for the ARV1 domain, it is applied to a probable gene prediction artifact that conflates two unrelated proteins. The true ARV1 gene in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
PMID:16725371
Arv1p is involved in the regulation of cellular lipid homeostasis in the yeast Saccharomyces cerevisiae. Here, we report the characterization of the two Arabidopsis thaliana ARV genes and the encoded proteins, AtArv1p and AtArv2p.
|
|
GO:0016125
sterol metabolic process
|
IEA
GO_REF:0000104 |
MARK AS OVER ANNOTATED |
Summary: Sterol metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins are mediators of sterol homeostasis, involved in sterol uptake, trafficking, and distribution into membranes. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1 in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa). The annotation conflates sterol metabolism (ARV1 domain) with histidine biosynthesis (HDH domain) on a single entry.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis: sterol uptake, trafficking, and distribution into membranes.
|
|
GO:0032366
intracellular sterol transport
|
IEA
GO_REF:0000120 |
MARK AS OVER ANNOTATED |
Summary: Intracellular sterol transport is a core function of ARV1 proteins, which mediate sterol transport from ER to plasma membrane. In yeast, ARV1 deletion leads to altered intracellular sterol distribution with decreased plasma membrane sterols and elevated ER/vacuolar sterols. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1 in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
Supporting Evidence:
PMID:16725371
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in the regulation of cellular lipid homeostasis.
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
|
|
GO:0097036
regulation of plasma membrane sterol distribution
|
IEA
GO_REF:0000104 |
MARK AS OVER ANNOTATED |
Summary: Regulation of plasma membrane sterol distribution is assigned via UniRule RU368065. This is a specific biological process for ARV1 proteins, which regulate how sterols are distributed between the ER and plasma membrane. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact combining unrelated HDH and ARV1 domains. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
|
|
GO:0005737
cytoplasm
|
IEA
GO_REF:0000118 |
MARK AS OVER ANNOTATED |
Summary: Cytoplasm localization is assigned by TreeGrafter/PANTHER (PTHR21256). This likely reflects the HDH domain, as plant HDH is synthesized in the cytoplasm before chloroplast import. However, this is a very broad localization term.
Reason: The term is too general. For the HDH domain, the correct localization is chloroplast stroma (GO:0009570), which is already annotated. For the ARV1 domain, the correct localization is ER membrane (GO:0005789). Applied to a gene prediction artifact.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants. ARV1 is an ER membrane protein with multiple transmembrane domains.
|
|
GO:0005783
endoplasmic reticulum
|
IEA
GO_REF:0000117 |
MARK AS OVER ANNOTATED |
Summary: Endoplasmic reticulum localization is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins are ER-localized membrane proteins. In Arabidopsis, both AtArv1p and AtArv2p are exclusively targeted to the ER [PMID:16725371]. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. This localization is incompatible with the chloroplast stroma localization of the HDH domain, which is the strongest evidence this is an incorrectly merged gene model.
Supporting Evidence:
PMID:16725371
both proteins are exclusively targeted to the endoplasmic reticulum
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
A single polypeptide cannot simultaneously function as a chloroplast stroma enzyme and an ER membrane protein.
|
|
GO:0005789
endoplasmic reticulum membrane
|
IEA
GO_REF:0000120 |
MARK AS OVER ANNOTATED |
Summary: ER membrane localization derives from UniProtKB-SubCell and reflects the multi-pass transmembrane topology of the ARV1 domain (transmembrane helices at aa 588-607 and 619-639). This applies exclusively to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The ER membrane localization directly contradicts the chloroplast stroma localization predicted for the HDH domain. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 is an ER membrane protein with multiple transmembrane domains. A single protein cannot function in both compartments simultaneously.
|
|
GO:0005829
cytosol
|
IEA
GO_REF:0000118 |
MARK AS OVER ANNOTATED |
Summary: Cytosol localization is assigned by TreeGrafter/PANTHER. This may reflect the HDH domain prior to chloroplast import, or it may be a generic assignment from the PANTHER family classification.
Reason: For the HDH domain, the mature protein is chloroplast stroma-localized, not cytosolic. For the ARV1 domain, the protein is ER membrane-localized. Cytosol is not the functional localization for either domain. Applied to a gene prediction artifact.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.
|
|
GO:0009507
chloroplast
|
IEA
GO_REF:0000044 |
MARK AS OVER ANNOTATED |
Summary: Chloroplast localization derives from UniProtKB-SubCell vocabulary mapping. Plant HDH is nuclear-encoded but chloroplast-targeted via a transit peptide. This applies to the N-terminal HDH domain. In plants, the entire histidine biosynthesis pathway operates in the chloroplast.
Reason: While correct for the HDH domain (chloroplast stroma is the more specific term, already annotated as GO:0009570), this annotation is on a gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). Additionally, chloroplast localization is incompatible with the ER membrane localization of the C-terminal ARV1 domain.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
In plants, HDH is expressed as a nuclear encoded protein precursor which is exported to the chloroplast. HDH has been immunolocalized to the chloroplast.
|
|
GO:0009570
chloroplast stroma
|
IEA
GO_REF:0000118 |
MARK AS OVER ANNOTATED |
Summary: Chloroplast stroma localization is assigned by TreeGrafter/PANTHER. This is the correct subcellular localization for plant HDH, which operates as a soluble enzyme in the chloroplast stroma. Structural studies in the related legume Medicago truncatula confirm the stroma localization.
Reason: While this is the most accurate localization for the HDH domain, it is applied to a gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). The chloroplast stroma localization is fundamentally incompatible with the ER membrane localization of the ARV1 domain on the same polypeptide.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with transmembrane domains. A single protein cannot function in both compartments simultaneously. The N-terminal transit peptide (for chloroplast import) would prevent ER membrane insertion.
|
|
GO:0016020
membrane
|
IEA
GO_REF:0000104 |
MARK AS OVER ANNOTATED |
Summary: Membrane localization is assigned via UniRule RU368065 for the ARV1 family, reflecting the transmembrane topology of the ARV1 domain. The C-terminal region contains two predicted transmembrane helices (aa 588-607 and 619-639).
Reason: This is a very general term (parent of ER membrane). While the ARV1 domain is indeed a membrane protein, this annotation is on a gene prediction artifact. The more specific GO:0005789 (ER membrane) is already annotated. The true ARV1 gene is NCGR_LOCUS4557.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 is an ER membrane protein with 2+ transmembrane helices.
|
Q: Is NCGR_LOCUS10166 a genuine fusion gene or a gene prediction artifact? The presence of separate, properly-sized genes for both domains (NCGR_LOCUS4557 for ARV1, NCGR_LOCUS4558 for HDH) at adjacent loci, combined with incompatible subcellular localizations (chloroplast stroma vs ER membrane), strongly suggests a gene model error. RNA-seq or proteomics evidence of a full-length 719 aa protein would be needed to support a real fusion.
Q: Should UniProt flag A0A811MX19 for review or suppression? The entry combines two unrelated protein families with contradictory biology. The preliminary WGS-derived genome annotation may have incorrectly merged two adjacent reading frames.
Experiment: Validate the gene model using RNA-seq data from Miscanthus lutarioriparius to determine whether NCGR_LOCUS10166 is transcribed as a single mRNA or represents two separate transcripts that were incorrectly merged in the genome annotation.
Hypothesis: RNA-seq will show two separate transcripts corresponding to the HDH and ARV1 domains, confirming this is a gene prediction artifact rather than a genuine fusion gene.
Experiment: Compare syntenic regions across related Poaceae genomes (Sorghum bicolor, Saccharum, Zea mays) to determine whether HDH and ARV1 genes are consistently found as separate adjacent genes, supporting the gene prediction artifact hypothesis.
Hypothesis: In all related grass genomes, HDH and ARV1 will be encoded as separate genes, confirming that the fusion in NCGR_LOCUS10166 is an artifact of genome annotation.
NCGR_LOCUS10166 (UniProt: A0A811MX19) is a 719 amino acid unreviewed TrEMBL entry from
Miscanthus lutarioriparius annotated as "Protein ARV". It contains two completely
unrelated protein domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23)
domain and a C-terminal ARV1 sterol homeostasis domain. This combination is almost
certainly a gene prediction artifact.
The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains:
- NCGR_LOCUS4558 (A0A811MHP4): Histidinol dehydrogenase, chloroplastic - 478 aa
- NCGR_LOCUS4557 (A0A811MIX4): Protein ARV - 230 aa
The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that
were correctly predicted in one genomic region but incorrectly fused in another (LOCUS10166).
The combined size (478 + ~241 = ~719) matches NCGR_LOCUS10166 precisely.
UniProt notes: "The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome
shotgun (WGS) entry which is preliminary data." Also: "Lacks conserved residue(s) required
for the propagation of feature annotation."
HDH catalyzes the final step (step 9/9) of L-histidine biosynthesis:
L-histidinol + 2 NAD+ + H2O -> L-histidine + 2 NADH + 3 H+
ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis:
- Sterol uptake, trafficking, and distribution into membranes
- Regulation of sphingolipid metabolism
- Sterol transport from ER to plasma membrane
All 17 GO annotations on A0A811MX19 are IEA (electronic), sourced from:
- InterPro2GO mappings (GO_REF:0000002)
- UniProtKB-UniRule (GO_REF:0000104, GO_REF:0000120)
- UniProtKB-SubCell (GO_REF:0000044)
- TreeGrafter/PANTHER (GO_REF:0000118)
- UniProtKB keyword mapping (GO_REF:0000117)
The annotations are internally contradictory because they combine two unrelated proteins:
- Chloroplast stroma + ER membrane localizations are mutually exclusive
- Histidine biosynthesis + sterol transport are unrelated pathways
- Oxidoreductase activity + lipid transport are unrelated functions
NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation.
The two domains should be treated as separate gene products. All electronic annotations are
technically correct for their respective domains but misleading when combined on a single entry.
This protein contains TWO completely unrelated functional domains:
Soluble enzyme, chloroplast stroma localized in plants
ARV1 domain (C-terminal, ~aa 480-719)
STRONG EVIDENCE: The same organism has separate, properly-sized genes for both domains:
- NCGR_LOCUS4558 (A0A811MHP4): Histidinol dehydrogenase, chloroplastic (478 aa) - normal HDH
- NCGR_LOCUS4557 (A0A811MIX4): Protein ARV (230 aa) - normal ARV1
Note that LOCUS4557 and LOCUS4558 are sequential/adjacent loci, strongly suggesting the
gene predictor incorrectly merged two adjacent genes in a different genomic region.
Biological incompatibility: HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER
membrane protein with transmembrane domains. A single protein cannot function in both
compartments simultaneously. The N-terminal transit peptide (for chloroplast import) would
prevent ER membrane insertion, and conversely the transmembrane domains would interfere
with chloroplast import.
Genome quality: The M. lutarioriparius genome is derived from WGS preliminary data
(noted in UniProt CAUTION field). The sequence originated from CAJGYO010000002.
Plant histidinol dehydrogenase catalyzes the final step of histidine biosynthesis:
L-histidinol + 2 NAD+ + H2O -> L-histidine + 2 NADH + 3 H+
ARV1 family proteins mediate sterol homeostasis:
- ER membrane-localized with 2+ transmembrane helices
- Involved in sterol transport from ER to plasma membrane
- Contains Arv1 homology domain (AHD) with cysteine-rich zinc-binding motif
- Arabidopsis has two ARV isoforms (AtArv1p, AtArv2p), both ER-localized PMID:16725371
- Loss of ARV1 causes altered sterol distribution, sphingolipid metabolism changes
- Regulates sphingolipid metabolism
All GO annotations on this entry reflect the conflation of two separate gene products:
- HDH-related annotations (GO:0004399, GO:0000105, GO:0051287, GO:0046872, GO:0009570, GO:0005829) come from the HDH domain
- ARV1-related annotations (GO:0032366, GO:0097036, GO:0006665, GO:0016125, GO:0005789, GO:0005783, GO:0016020) come from the ARV1 domain
- Some annotations (GO:0005737, GO:0016491, GO:0016616, GO:0009507) could apply to either
The correct approach would be to split the annotations by domain, but since this is a
single UniProt entry, the review should note that annotations are domain-specific and
that this appears to be a gene model error.
id: A0A811MX19
gene_symbol: NCGR_LOCUS10166
product_type: PROTEIN
status: COMPLETE
tags:
- gene-prediction-artifact
- fusion-protein
- plant
taxon:
id: NCBITaxon:422564
label: Miscanthus lutarioriparius
description: >-
NCGR_LOCUS10166 is an unreviewed TrEMBL entry (A0A811MX19) from Miscanthus lutarioriparius
annotated as "Protein ARV". It is a 719 amino acid protein containing two completely unrelated
domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23) domain and a C-terminal
ARV1 sterol homeostasis domain. This combination is almost certainly a gene prediction artifact,
as the same organism encodes properly-sized separate genes for both domains at adjacent loci
(NCGR_LOCUS4558/A0A811MHP4 for HDH at 478 aa, and NCGR_LOCUS4557/A0A811MIX4 for ARV1 at
230 aa). The two domains have incompatible subcellular localizations: HDH is a soluble
chloroplast stroma enzyme while ARV1 is an ER multi-pass membrane protein. All GO annotations
are IEA-based and reflect the conflation of two separate gene products. The genome sequence
is derived from preliminary WGS data.
existing_annotations:
- term:
id: GO:0004399
label: histidinol dehydrogenase activity
evidence_type: IEA
original_reference_id: GO_REF:0000120
review:
summary: >-
Histidinol dehydrogenase activity (EC 1.1.1.23) is correctly predicted for the N-terminal
HDH domain of this protein based on HAMAP rule MF_01024. The domain contains CDD cd06572
(Histidinol_dh), Pfam PF00815, and the PROSITE active site PS00611. However, UniProt notes
this entry "lacks conserved residue(s) required for the propagation of feature annotation,"
raising questions about catalytic competence. The same organism has a properly-sized separate
HDH (A0A811MHP4, NCGR_LOCUS4558, 478 aa) that likely represents the true functional HDH gene.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While the HDH domain signature is present in the N-terminal region, this annotation is
misleading because it is applied to a probable gene prediction artifact. The entry likely
represents two incorrectly merged adjacent genes. The real HDH in this organism is
NCGR_LOCUS4558 (A0A811MHP4). Additionally, UniProt flags missing conserved residues
needed for feature propagation.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
The same M. lutarioriparius genome encodes properly-sized separate proteins for both
domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.
The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that
were correctly predicted in one genomic region but incorrectly fused in another.
- term:
id: GO:0016491
label: oxidoreductase activity
evidence_type: IEA
original_reference_id: GO_REF:0000002
review:
summary: >-
Oxidoreductase activity is a parent term of histidinol dehydrogenase activity and is
mapped from InterPro IPR016161 (Ald_DH/histidinol_DH). This is a broad classification
that applies to the N-terminal HDH domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
This is a very general parent term that adds no specificity beyond what GO:0004399
(histidinol dehydrogenase activity) already provides. More importantly, it is applied
to a probable gene prediction artifact. The annotation is technically correct for the
HDH domain but redundant and misleading in this context.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation.
The two domains should be treated as separate gene products.
- term:
id: GO:0016616
label: oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor
evidence_type: IEA
original_reference_id: GO_REF:0000002
review:
summary: >-
This term is mapped from InterPro IPR012131 (Hstdl_DH) and correctly describes the
reaction mechanism of histidinol dehydrogenase using NAD+ as acceptor. It applies to
the N-terminal HDH domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While technically correct for the HDH domain, this is an intermediate-specificity term
that is redundant with the more specific GO:0004399 (histidinol dehydrogenase activity).
Applied to a probable gene prediction artifact where annotations from two separate
gene products are conflated.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
All electronic annotations are technically correct for their respective domains but
misleading when combined on a single entry.
- term:
id: GO:0046872
label: metal ion binding
evidence_type: IEA
original_reference_id: GO_REF:0000002
review:
summary: >-
Metal ion binding is mapped from InterPro IPR001692 (Histidinol_DH_CS), reflecting
the zinc cofactor requirement of histidinol dehydrogenase. The ARV1 domain also
contains a cysteine-rich subdomain with a putative zinc-binding motif. Both domains
likely bind metal ions but for completely different purposes.
action: MARK_AS_OVER_ANNOTATED
reason: >-
This is a very general term. For the HDH domain, the more informative annotation would
be zinc ion binding (GO:0008270) specifically as a catalytic cofactor. For the ARV1
domain, zinc binding relates to the structural zinc-binding motif in the AHD domain.
The term is too vague and applied to a probable gene prediction artifact combining
two unrelated proteins.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with
transmembrane domains. A single protein cannot function in both compartments
simultaneously.
- term:
id: GO:0051287
label: NAD binding
evidence_type: IEA
original_reference_id: GO_REF:0000002
review:
summary: >-
NAD binding is mapped from InterPro IPR012131 (Hstdl_DH) and correctly reflects the
cofactor requirement of histidinol dehydrogenase, which uses 2 NAD+ molecules per
catalytic cycle. This applies exclusively to the N-terminal HDH domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While technically correct for the HDH domain, this annotation is applied to a probable
gene prediction artifact. NAD binding is part of the histidinol dehydrogenase activity
(GO:0004399) and somewhat redundant. The real HDH gene in this organism is NCGR_LOCUS4558.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa
represents the true HDH gene.
- term:
id: GO:0000105
label: L-histidine biosynthetic process
evidence_type: IEA
original_reference_id: GO_REF:0000120
review:
summary: >-
L-histidine biosynthetic process annotation derives from HAMAP rule MF_01024 and
UniPathway (UPA00031, step 9/9). HDH catalyzes the final step of histidine biosynthesis.
In plants, this pathway operates in the chloroplast. This annotation applies exclusively
to the N-terminal HDH domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While the HDH domain is correctly associated with histidine biosynthesis, this annotation
is on a probable gene prediction artifact. The true histidine biosynthesis HDH in this
organism is NCGR_LOCUS4558 (A0A811MHP4, 478 aa). UniProt also notes missing conserved
residues, questioning whether this particular copy is catalytically active even if the
gene model were correct.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
The same M. lutarioriparius genome encodes properly-sized separate proteins for both
domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.
- term:
id: GO:0006665
label: sphingolipid metabolic process
evidence_type: IEA
original_reference_id: GO_REF:0000104
review:
summary: >-
Sphingolipid metabolic process is assigned via UniRule RU368065 for the ARV1 family.
ARV1 proteins regulate sphingolipid metabolism in yeast and plants. Arabidopsis ARV
isoforms (AtArv1p, AtArv2p) are ER-localized and modulate both sterol and sphingolipid
homeostasis [PMID:16725371]. This annotation applies to the C-terminal ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While this annotation is correct for the ARV1 domain, it is applied to a probable gene
prediction artifact that conflates two unrelated proteins. The true ARV1 gene in this
organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
- reference_id: PMID:16725371
supporting_text: >-
Arv1p is involved in the regulation of cellular lipid homeostasis in the yeast
Saccharomyces cerevisiae. Here, we report the characterization of the two
Arabidopsis thaliana ARV genes and the encoded proteins, AtArv1p and AtArv2p.
- term:
id: GO:0016125
label: sterol metabolic process
evidence_type: IEA
original_reference_id: GO_REF:0000104
review:
summary: >-
Sterol metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1
proteins are mediators of sterol homeostasis, involved in sterol uptake, trafficking,
and distribution into membranes. This applies to the C-terminal ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1
in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa). The annotation conflates
sterol metabolism (ARV1 domain) with histidine biosynthesis (HDH domain) on a single entry.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis:
sterol uptake, trafficking, and distribution into membranes.
- term:
id: GO:0032366
label: intracellular sterol transport
evidence_type: IEA
original_reference_id: GO_REF:0000120
review:
summary: >-
Intracellular sterol transport is a core function of ARV1 proteins, which mediate
sterol transport from ER to plasma membrane. In yeast, ARV1 deletion leads to altered
intracellular sterol distribution with decreased plasma membrane sterols and elevated
ER/vacuolar sterols. This applies to the C-terminal ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1
in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
supported_by:
- reference_id: PMID:16725371
supporting_text: >-
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved
in the regulation of cellular lipid homeostasis.
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
- term:
id: GO:0097036
label: regulation of plasma membrane sterol distribution
evidence_type: IEA
original_reference_id: GO_REF:0000104
review:
summary: >-
Regulation of plasma membrane sterol distribution is assigned via UniRule RU368065.
This is a specific biological process for ARV1 proteins, which regulate how sterols
are distributed between the ER and plasma membrane. This applies to the C-terminal
ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
Correct for the ARV1 domain but applied to a gene prediction artifact combining
unrelated HDH and ARV1 domains. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
- term:
id: GO:0005737
label: cytoplasm
evidence_type: IEA
original_reference_id: GO_REF:0000118
review:
summary: >-
Cytoplasm localization is assigned by TreeGrafter/PANTHER (PTHR21256). This likely
reflects the HDH domain, as plant HDH is synthesized in the cytoplasm before chloroplast
import. However, this is a very broad localization term.
action: MARK_AS_OVER_ANNOTATED
reason: >-
The term is too general. For the HDH domain, the correct localization is chloroplast
stroma (GO:0009570), which is already annotated. For the ARV1 domain, the correct
localization is ER membrane (GO:0005789). Applied to a gene prediction artifact.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.
ARV1 is an ER membrane protein with multiple transmembrane domains.
- term:
id: GO:0005783
label: endoplasmic reticulum
evidence_type: IEA
original_reference_id: GO_REF:0000117
review:
summary: >-
Endoplasmic reticulum localization is assigned via UniRule RU368065 for the ARV1 family.
ARV1 proteins are ER-localized membrane proteins. In Arabidopsis, both AtArv1p and
AtArv2p are exclusively targeted to the ER [PMID:16725371]. This applies to the
C-terminal ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
Correct for the ARV1 domain but applied to a gene prediction artifact. This localization
is incompatible with the chloroplast stroma localization of the HDH domain, which is
the strongest evidence this is an incorrectly merged gene model.
supported_by:
- reference_id: PMID:16725371
supporting_text: >-
both proteins are exclusively targeted to the endoplasmic reticulum
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
A single polypeptide cannot simultaneously function as a chloroplast stroma enzyme
and an ER membrane protein.
- term:
id: GO:0005789
label: endoplasmic reticulum membrane
evidence_type: IEA
original_reference_id: GO_REF:0000120
review:
summary: >-
ER membrane localization derives from UniProtKB-SubCell and reflects the multi-pass
transmembrane topology of the ARV1 domain (transmembrane helices at aa 588-607 and
619-639). This applies exclusively to the C-terminal ARV1 domain.
action: MARK_AS_OVER_ANNOTATED
reason: >-
Correct for the ARV1 domain but applied to a gene prediction artifact. The ER membrane
localization directly contradicts the chloroplast stroma localization predicted for the
HDH domain. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
ARV1 is an ER membrane protein with multiple transmembrane domains. A single protein
cannot function in both compartments simultaneously.
- term:
id: GO:0005829
label: cytosol
evidence_type: IEA
original_reference_id: GO_REF:0000118
review:
summary: >-
Cytosol localization is assigned by TreeGrafter/PANTHER. This may reflect the HDH
domain prior to chloroplast import, or it may be a generic assignment from the
PANTHER family classification.
action: MARK_AS_OVER_ANNOTATED
reason: >-
For the HDH domain, the mature protein is chloroplast stroma-localized, not cytosolic.
For the ARV1 domain, the protein is ER membrane-localized. Cytosol is not the functional
localization for either domain. Applied to a gene prediction artifact.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.
- term:
id: GO:0009507
label: chloroplast
evidence_type: IEA
original_reference_id: GO_REF:0000044
review:
summary: >-
Chloroplast localization derives from UniProtKB-SubCell vocabulary mapping. Plant HDH
is nuclear-encoded but chloroplast-targeted via a transit peptide. This applies to the
N-terminal HDH domain. In plants, the entire histidine biosynthesis pathway operates
in the chloroplast.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While correct for the HDH domain (chloroplast stroma is the more specific term, already
annotated as GO:0009570), this annotation is on a gene prediction artifact. The true
HDH gene is NCGR_LOCUS4558 (A0A811MHP4). Additionally, chloroplast localization is
incompatible with the ER membrane localization of the C-terminal ARV1 domain.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
In plants, HDH is expressed as a nuclear encoded protein precursor which is exported
to the chloroplast. HDH has been immunolocalized to the chloroplast.
- term:
id: GO:0009570
label: chloroplast stroma
evidence_type: IEA
original_reference_id: GO_REF:0000118
review:
summary: >-
Chloroplast stroma localization is assigned by TreeGrafter/PANTHER. This is the correct
subcellular localization for plant HDH, which operates as a soluble enzyme in the
chloroplast stroma. Structural studies in the related legume Medicago truncatula
confirm the stroma localization.
action: MARK_AS_OVER_ANNOTATED
reason: >-
While this is the most accurate localization for the HDH domain, it is applied to a
gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). The
chloroplast stroma localization is fundamentally incompatible with the ER membrane
localization of the ARV1 domain on the same polypeptide.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with
transmembrane domains. A single protein cannot function in both compartments
simultaneously. The N-terminal transit peptide (for chloroplast import) would
prevent ER membrane insertion.
- term:
id: GO:0016020
label: membrane
evidence_type: IEA
original_reference_id: GO_REF:0000104
review:
summary: >-
Membrane localization is assigned via UniRule RU368065 for the ARV1 family, reflecting
the transmembrane topology of the ARV1 domain. The C-terminal region contains two
predicted transmembrane helices (aa 588-607 and 619-639).
action: MARK_AS_OVER_ANNOTATED
reason: >-
This is a very general term (parent of ER membrane). While the ARV1 domain is indeed
a membrane protein, this annotation is on a gene prediction artifact. The more specific
GO:0005789 (ER membrane) is already annotated. The true ARV1 gene is NCGR_LOCUS4557.
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
ARV1 is an ER membrane protein with 2+ transmembrane helices.
references:
- id: GO_REF:0000002
title: >-
Gene Ontology annotation through association of InterPro records with GO terms.
findings: []
- id: GO_REF:0000044
title: >-
Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary
mapping, accompanied by conservative changes to GO terms applied by UniProt.
findings: []
- id: GO_REF:0000104
title: >-
Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping, accompanied by
conservative changes to GO terms applied by UniProt.
findings: []
- id: GO_REF:0000117
title: >-
Gene Ontology annotation by automatic transfer of UniProtKB UniRule annotation.
findings: []
- id: GO_REF:0000118
title: >-
Gene Ontology annotation by TreeGrafter/PANTHER phylogenetic inference.
findings: []
- id: GO_REF:0000120
title: >-
Gene Ontology annotation by HAMAP-Rule UniProtKB automatic annotation.
findings: []
- id: PMID:16725371
title: >-
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in
the regulation of cellular lipid homeostasis.
findings:
- statement: Both Arabidopsis ARV isoforms are exclusively ER-localized
supporting_text: >-
both proteins are exclusively targeted to the endoplasmic reticulum
- statement: ARV proteins contain the bipartite Arv1 homology domain with zinc-binding motif
supporting_text: >-
Both Arabidopsis proteins contain the bipartite Arv1 homology domain (AHD), which consists
of an NH2-terminal cysteine-rich subdomain with a putative zinc-binding motif followed by
a C-terminal subdomain.
- id: PMID:33911077
title: >-
Chromosome-scale assembly and analysis of biomass crop Miscanthus lutarioriparius genome.
findings:
- statement: >-
M. lutarioriparius genome was assembled to chromosome scale from WGS data with contig N50
of 1.71 Mb covering 96.64% of the genome.
- id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
title: >-
Manual deep research on NCGR_LOCUS10166 identifying gene prediction artifact.
findings:
- statement: >-
NCGR_LOCUS10166 is a fusion of two unrelated domains (HDH + ARV1) from incorrectly
merged adjacent genes.
- statement: >-
Separate properly-sized genes exist: NCGR_LOCUS4558 (HDH, 478 aa) and NCGR_LOCUS4557
(ARV1, 230 aa).
- statement: >-
The two domains have incompatible subcellular localizations (chloroplast stroma vs
ER membrane).
core_functions:
- molecular_function:
id: GO:0004399
label: histidinol dehydrogenase activity
description: >-
The N-terminal domain (~aa 1-480) of this predicted protein encodes a histidinol
dehydrogenase that catalyzes the final step of L-histidine biosynthesis (EC 1.1.1.23).
However, this is almost certainly a gene prediction artifact: the same organism has a
separate, properly-sized HDH gene (NCGR_LOCUS4558, A0A811MHP4, 478 aa) that is the
true functional HDH. This entry should not be considered a real gene product.
directly_involved_in:
- id: GO:0000105
label: L-histidine biosynthetic process
locations:
- id: GO:0009570
label: chloroplast stroma
supported_by:
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa
represents the true HDH gene.
- molecular_function:
id: GO:0032934
label: sterol binding
description: >-
The C-terminal domain (~aa 480-719) encodes an ARV1 family protein involved in sterol
homeostasis, sterol transport from ER to plasma membrane, and sphingolipid metabolism
regulation. However, this is almost certainly a gene prediction artifact: the same
organism has a separate, properly-sized ARV1 gene (NCGR_LOCUS4557, A0A811MIX4, 230 aa)
that is the true functional ARV1. This entry should not be considered a real gene product.
directly_involved_in:
- id: GO:0016125
label: sterol metabolic process
- id: GO:0006665
label: sphingolipid metabolic process
locations:
- id: GO:0005789
label: endoplasmic reticulum membrane
supported_by:
- reference_id: PMID:16725371
supporting_text: >-
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved
in the regulation of cellular lipid homeostasis.
- reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
supporting_text: >-
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
proposed_new_terms: []
suggested_questions:
- question: >-
Is NCGR_LOCUS10166 a genuine fusion gene or a gene prediction artifact? The presence of
separate, properly-sized genes for both domains (NCGR_LOCUS4557 for ARV1, NCGR_LOCUS4558
for HDH) at adjacent loci, combined with incompatible subcellular localizations (chloroplast
stroma vs ER membrane), strongly suggests a gene model error. RNA-seq or proteomics evidence
of a full-length 719 aa protein would be needed to support a real fusion.
- question: >-
Should UniProt flag A0A811MX19 for review or suppression? The entry combines two unrelated
protein families with contradictory biology. The preliminary WGS-derived genome annotation
may have incorrectly merged two adjacent reading frames.
suggested_experiments:
- description: >-
Validate the gene model using RNA-seq data from Miscanthus lutarioriparius to determine
whether NCGR_LOCUS10166 is transcribed as a single mRNA or represents two separate
transcripts that were incorrectly merged in the genome annotation.
hypothesis: >-
RNA-seq will show two separate transcripts corresponding to the HDH and ARV1 domains,
confirming this is a gene prediction artifact rather than a genuine fusion gene.
- description: >-
Compare syntenic regions across related Poaceae genomes (Sorghum bicolor, Saccharum,
Zea mays) to determine whether HDH and ARV1 genes are consistently found as separate
adjacent genes, supporting the gene prediction artifact hypothesis.
hypothesis: >-
In all related grass genomes, HDH and ARV1 will be encoded as separate genes, confirming
that the fusion in NCGR_LOCUS10166 is an artifact of genome annotation.