NCGR_LOCUS10166

UniProt ID: A0A811MX19
Organism: Miscanthus lutarioriparius
Review Status: COMPLETE
📝 Provide Detailed Feedback

Gene Description

NCGR_LOCUS10166 is an unreviewed TrEMBL entry (A0A811MX19) from Miscanthus lutarioriparius annotated as "Protein ARV". It is a 719 amino acid protein containing two completely unrelated domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23) domain and a C-terminal ARV1 sterol homeostasis domain. This combination is almost certainly a gene prediction artifact, as the same organism encodes properly-sized separate genes for both domains at adjacent loci (NCGR_LOCUS4558/A0A811MHP4 for HDH at 478 aa, and NCGR_LOCUS4557/A0A811MIX4 for ARV1 at 230 aa). The two domains have incompatible subcellular localizations: HDH is a soluble chloroplast stroma enzyme while ARV1 is an ER multi-pass membrane protein. All GO annotations are IEA-based and reflect the conflation of two separate gene products. The genome sequence is derived from preliminary WGS data.

Existing Annotations Review

GO Term Evidence Action Reason
GO:0004399 histidinol dehydrogenase activity
IEA
GO_REF:0000120
MARK AS OVER ANNOTATED
Summary: Histidinol dehydrogenase activity (EC 1.1.1.23) is correctly predicted for the N-terminal HDH domain of this protein based on HAMAP rule MF_01024. The domain contains CDD cd06572 (Histidinol_dh), Pfam PF00815, and the PROSITE active site PS00611. However, UniProt notes this entry "lacks conserved residue(s) required for the propagation of feature annotation," raising questions about catalytic competence. The same organism has a properly-sized separate HDH (A0A811MHP4, NCGR_LOCUS4558, 478 aa) that likely represents the true functional HDH gene.
Reason: While the HDH domain signature is present in the N-terminal region, this annotation is misleading because it is applied to a probable gene prediction artifact. The entry likely represents two incorrectly merged adjacent genes. The real HDH in this organism is NCGR_LOCUS4558 (A0A811MHP4). Additionally, UniProt flags missing conserved residues needed for feature propagation.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa. The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that were correctly predicted in one genomic region but incorrectly fused in another.
GO:0016491 oxidoreductase activity
IEA
GO_REF:0000002
MARK AS OVER ANNOTATED
Summary: Oxidoreductase activity is a parent term of histidinol dehydrogenase activity and is mapped from InterPro IPR016161 (Ald_DH/histidinol_DH). This is a broad classification that applies to the N-terminal HDH domain.
Reason: This is a very general parent term that adds no specificity beyond what GO:0004399 (histidinol dehydrogenase activity) already provides. More importantly, it is applied to a probable gene prediction artifact. The annotation is technically correct for the HDH domain but redundant and misleading in this context.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation. The two domains should be treated as separate gene products.
GO:0016616 oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor
IEA
GO_REF:0000002
MARK AS OVER ANNOTATED
Summary: This term is mapped from InterPro IPR012131 (Hstdl_DH) and correctly describes the reaction mechanism of histidinol dehydrogenase using NAD+ as acceptor. It applies to the N-terminal HDH domain.
Reason: While technically correct for the HDH domain, this is an intermediate-specificity term that is redundant with the more specific GO:0004399 (histidinol dehydrogenase activity). Applied to a probable gene prediction artifact where annotations from two separate gene products are conflated.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
All electronic annotations are technically correct for their respective domains but misleading when combined on a single entry.
GO:0046872 metal ion binding
IEA
GO_REF:0000002
MARK AS OVER ANNOTATED
Summary: Metal ion binding is mapped from InterPro IPR001692 (Histidinol_DH_CS), reflecting the zinc cofactor requirement of histidinol dehydrogenase. The ARV1 domain also contains a cysteine-rich subdomain with a putative zinc-binding motif. Both domains likely bind metal ions but for completely different purposes.
Reason: This is a very general term. For the HDH domain, the more informative annotation would be zinc ion binding (GO:0008270) specifically as a catalytic cofactor. For the ARV1 domain, zinc binding relates to the structural zinc-binding motif in the AHD domain. The term is too vague and applied to a probable gene prediction artifact combining two unrelated proteins.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with transmembrane domains. A single protein cannot function in both compartments simultaneously.
GO:0051287 NAD binding
IEA
GO_REF:0000002
MARK AS OVER ANNOTATED
Summary: NAD binding is mapped from InterPro IPR012131 (Hstdl_DH) and correctly reflects the cofactor requirement of histidinol dehydrogenase, which uses 2 NAD+ molecules per catalytic cycle. This applies exclusively to the N-terminal HDH domain.
Reason: While technically correct for the HDH domain, this annotation is applied to a probable gene prediction artifact. NAD binding is part of the histidinol dehydrogenase activity (GO:0004399) and somewhat redundant. The real HDH gene in this organism is NCGR_LOCUS4558.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa represents the true HDH gene.
GO:0000105 L-histidine biosynthetic process
IEA
GO_REF:0000120
MARK AS OVER ANNOTATED
Summary: L-histidine biosynthetic process annotation derives from HAMAP rule MF_01024 and UniPathway (UPA00031, step 9/9). HDH catalyzes the final step of histidine biosynthesis. In plants, this pathway operates in the chloroplast. This annotation applies exclusively to the N-terminal HDH domain.
Reason: While the HDH domain is correctly associated with histidine biosynthesis, this annotation is on a probable gene prediction artifact. The true histidine biosynthesis HDH in this organism is NCGR_LOCUS4558 (A0A811MHP4, 478 aa). UniProt also notes missing conserved residues, questioning whether this particular copy is catalytically active even if the gene model were correct.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.
GO:0006665 sphingolipid metabolic process
IEA
GO_REF:0000104
MARK AS OVER ANNOTATED
Summary: Sphingolipid metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins regulate sphingolipid metabolism in yeast and plants. Arabidopsis ARV isoforms (AtArv1p, AtArv2p) are ER-localized and modulate both sterol and sphingolipid homeostasis [PMID:16725371]. This annotation applies to the C-terminal ARV1 domain.
Reason: While this annotation is correct for the ARV1 domain, it is applied to a probable gene prediction artifact that conflates two unrelated proteins. The true ARV1 gene in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
PMID:16725371
Arv1p is involved in the regulation of cellular lipid homeostasis in the yeast Saccharomyces cerevisiae. Here, we report the characterization of the two Arabidopsis thaliana ARV genes and the encoded proteins, AtArv1p and AtArv2p.
GO:0016125 sterol metabolic process
IEA
GO_REF:0000104
MARK AS OVER ANNOTATED
Summary: Sterol metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins are mediators of sterol homeostasis, involved in sterol uptake, trafficking, and distribution into membranes. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1 in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa). The annotation conflates sterol metabolism (ARV1 domain) with histidine biosynthesis (HDH domain) on a single entry.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis: sterol uptake, trafficking, and distribution into membranes.
GO:0032366 intracellular sterol transport
IEA
GO_REF:0000120
MARK AS OVER ANNOTATED
Summary: Intracellular sterol transport is a core function of ARV1 proteins, which mediate sterol transport from ER to plasma membrane. In yeast, ARV1 deletion leads to altered intracellular sterol distribution with decreased plasma membrane sterols and elevated ER/vacuolar sterols. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1 in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
Supporting Evidence:
PMID:16725371
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in the regulation of cellular lipid homeostasis.
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
GO:0097036 regulation of plasma membrane sterol distribution
IEA
GO_REF:0000104
MARK AS OVER ANNOTATED
Summary: Regulation of plasma membrane sterol distribution is assigned via UniRule RU368065. This is a specific biological process for ARV1 proteins, which regulate how sterols are distributed between the ER and plasma membrane. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact combining unrelated HDH and ARV1 domains. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
GO:0005737 cytoplasm
IEA
GO_REF:0000118
MARK AS OVER ANNOTATED
Summary: Cytoplasm localization is assigned by TreeGrafter/PANTHER (PTHR21256). This likely reflects the HDH domain, as plant HDH is synthesized in the cytoplasm before chloroplast import. However, this is a very broad localization term.
Reason: The term is too general. For the HDH domain, the correct localization is chloroplast stroma (GO:0009570), which is already annotated. For the ARV1 domain, the correct localization is ER membrane (GO:0005789). Applied to a gene prediction artifact.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants. ARV1 is an ER membrane protein with multiple transmembrane domains.
GO:0005783 endoplasmic reticulum
IEA
GO_REF:0000117
MARK AS OVER ANNOTATED
Summary: Endoplasmic reticulum localization is assigned via UniRule RU368065 for the ARV1 family. ARV1 proteins are ER-localized membrane proteins. In Arabidopsis, both AtArv1p and AtArv2p are exclusively targeted to the ER [PMID:16725371]. This applies to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. This localization is incompatible with the chloroplast stroma localization of the HDH domain, which is the strongest evidence this is an incorrectly merged gene model.
Supporting Evidence:
PMID:16725371
both proteins are exclusively targeted to the endoplasmic reticulum
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
A single polypeptide cannot simultaneously function as a chloroplast stroma enzyme and an ER membrane protein.
GO:0005789 endoplasmic reticulum membrane
IEA
GO_REF:0000120
MARK AS OVER ANNOTATED
Summary: ER membrane localization derives from UniProtKB-SubCell and reflects the multi-pass transmembrane topology of the ARV1 domain (transmembrane helices at aa 588-607 and 619-639). This applies exclusively to the C-terminal ARV1 domain.
Reason: Correct for the ARV1 domain but applied to a gene prediction artifact. The ER membrane localization directly contradicts the chloroplast stroma localization predicted for the HDH domain. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 is an ER membrane protein with multiple transmembrane domains. A single protein cannot function in both compartments simultaneously.
GO:0005829 cytosol
IEA
GO_REF:0000118
MARK AS OVER ANNOTATED
Summary: Cytosol localization is assigned by TreeGrafter/PANTHER. This may reflect the HDH domain prior to chloroplast import, or it may be a generic assignment from the PANTHER family classification.
Reason: For the HDH domain, the mature protein is chloroplast stroma-localized, not cytosolic. For the ARV1 domain, the protein is ER membrane-localized. Cytosol is not the functional localization for either domain. Applied to a gene prediction artifact.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.
GO:0009507 chloroplast
IEA
GO_REF:0000044
MARK AS OVER ANNOTATED
Summary: Chloroplast localization derives from UniProtKB-SubCell vocabulary mapping. Plant HDH is nuclear-encoded but chloroplast-targeted via a transit peptide. This applies to the N-terminal HDH domain. In plants, the entire histidine biosynthesis pathway operates in the chloroplast.
Reason: While correct for the HDH domain (chloroplast stroma is the more specific term, already annotated as GO:0009570), this annotation is on a gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). Additionally, chloroplast localization is incompatible with the ER membrane localization of the C-terminal ARV1 domain.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
In plants, HDH is expressed as a nuclear encoded protein precursor which is exported to the chloroplast. HDH has been immunolocalized to the chloroplast.
GO:0009570 chloroplast stroma
IEA
GO_REF:0000118
MARK AS OVER ANNOTATED
Summary: Chloroplast stroma localization is assigned by TreeGrafter/PANTHER. This is the correct subcellular localization for plant HDH, which operates as a soluble enzyme in the chloroplast stroma. Structural studies in the related legume Medicago truncatula confirm the stroma localization.
Reason: While this is the most accurate localization for the HDH domain, it is applied to a gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). The chloroplast stroma localization is fundamentally incompatible with the ER membrane localization of the ARV1 domain on the same polypeptide.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with transmembrane domains. A single protein cannot function in both compartments simultaneously. The N-terminal transit peptide (for chloroplast import) would prevent ER membrane insertion.
GO:0016020 membrane
IEA
GO_REF:0000104
MARK AS OVER ANNOTATED
Summary: Membrane localization is assigned via UniRule RU368065 for the ARV1 family, reflecting the transmembrane topology of the ARV1 domain. The C-terminal region contains two predicted transmembrane helices (aa 588-607 and 619-639).
Reason: This is a very general term (parent of ER membrane). While the ARV1 domain is indeed a membrane protein, this annotation is on a gene prediction artifact. The more specific GO:0005789 (ER membrane) is already annotated. The true ARV1 gene is NCGR_LOCUS4557.
Supporting Evidence:
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
ARV1 is an ER membrane protein with 2+ transmembrane helices.

Core Functions

The N-terminal domain (~aa 1-480) of this predicted protein encodes a histidinol dehydrogenase that catalyzes the final step of L-histidine biosynthesis (EC 1.1.1.23). However, this is almost certainly a gene prediction artifact: the same organism has a separate, properly-sized HDH gene (NCGR_LOCUS4558, A0A811MHP4, 478 aa) that is the true functional HDH. This entry should not be considered a real gene product.

Cellular Locations:
Supporting Evidence:
  • file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
    NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa represents the true HDH gene.

The C-terminal domain (~aa 480-719) encodes an ARV1 family protein involved in sterol homeostasis, sterol transport from ER to plasma membrane, and sphingolipid metabolism regulation. However, this is almost certainly a gene prediction artifact: the same organism has a separate, properly-sized ARV1 gene (NCGR_LOCUS4557, A0A811MIX4, 230 aa) that is the true functional ARV1. This entry should not be considered a real gene product.

Supporting Evidence:
  • PMID:16725371
    Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in the regulation of cellular lipid homeostasis.
  • file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
    NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.

References

Gene Ontology annotation through association of InterPro records with GO terms.
Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary mapping, accompanied by conservative changes to GO terms applied by UniProt.
Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping, accompanied by conservative changes to GO terms applied by UniProt.
Gene Ontology annotation by automatic transfer of UniProtKB UniRule annotation.
Gene Ontology annotation by TreeGrafter/PANTHER phylogenetic inference.
Gene Ontology annotation by HAMAP-Rule UniProtKB automatic annotation.
Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in the regulation of cellular lipid homeostasis.
  • Both Arabidopsis ARV isoforms are exclusively ER-localized
    "both proteins are exclusively targeted to the endoplasmic reticulum"
  • ARV proteins contain the bipartite Arv1 homology domain with zinc-binding motif
    "Both Arabidopsis proteins contain the bipartite Arv1 homology domain (AHD), which consists of an NH2-terminal cysteine-rich subdomain with a putative zinc-binding motif followed by a C-terminal subdomain."
Chromosome-scale assembly and analysis of biomass crop Miscanthus lutarioriparius genome.
  • M. lutarioriparius genome was assembled to chromosome scale from WGS data with contig N50 of 1.71 Mb covering 96.64% of the genome.
file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
Manual deep research on NCGR_LOCUS10166 identifying gene prediction artifact.
  • NCGR_LOCUS10166 is a fusion of two unrelated domains (HDH + ARV1) from incorrectly merged adjacent genes.
  • Separate properly-sized genes exist: NCGR_LOCUS4558 (HDH, 478 aa) and NCGR_LOCUS4557 (ARV1, 230 aa).
  • The two domains have incompatible subcellular localizations (chloroplast stroma vs ER membrane).

Suggested Questions for Experts

Q: Is NCGR_LOCUS10166 a genuine fusion gene or a gene prediction artifact? The presence of separate, properly-sized genes for both domains (NCGR_LOCUS4557 for ARV1, NCGR_LOCUS4558 for HDH) at adjacent loci, combined with incompatible subcellular localizations (chloroplast stroma vs ER membrane), strongly suggests a gene model error. RNA-seq or proteomics evidence of a full-length 719 aa protein would be needed to support a real fusion.

Q: Should UniProt flag A0A811MX19 for review or suppression? The entry combines two unrelated protein families with contradictory biology. The preliminary WGS-derived genome annotation may have incorrectly merged two adjacent reading frames.

Suggested Experiments

Experiment: Validate the gene model using RNA-seq data from Miscanthus lutarioriparius to determine whether NCGR_LOCUS10166 is transcribed as a single mRNA or represents two separate transcripts that were incorrectly merged in the genome annotation.

Hypothesis: RNA-seq will show two separate transcripts corresponding to the HDH and ARV1 domains, confirming this is a gene prediction artifact rather than a genuine fusion gene.

Experiment: Compare syntenic regions across related Poaceae genomes (Sorghum bicolor, Saccharum, Zea mays) to determine whether HDH and ARV1 genes are consistently found as separate adjacent genes, supporting the gene prediction artifact hypothesis.

Hypothesis: In all related grass genomes, HDH and ARV1 will be encoded as separate genes, confirming that the fusion in NCGR_LOCUS10166 is an artifact of genome annotation.

Tags

gene-prediction-artifact fusion-protein plant

Deep Research

Manual

(NCGR_LOCUS10166-deep-research-manual.md)
Deep Research: NCGR_LOCUS10166 (A0A811MX19) - Miscanthus lutarioriparius Manual

Deep Research: NCGR_LOCUS10166 (A0A811MX19) - Miscanthus lutarioriparius

Summary

NCGR_LOCUS10166 (UniProt: A0A811MX19) is a 719 amino acid unreviewed TrEMBL entry from
Miscanthus lutarioriparius annotated as "Protein ARV". It contains two completely
unrelated protein domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23)
domain and a C-terminal ARV1 sterol homeostasis domain. This combination is almost
certainly a gene prediction artifact.

Evidence for Gene Prediction Artifact

Separate genes exist in the same organism

The same M. lutarioriparius genome encodes properly-sized separate proteins for both domains:
- NCGR_LOCUS4558 (A0A811MHP4): Histidinol dehydrogenase, chloroplastic - 478 aa
- NCGR_LOCUS4557 (A0A811MIX4): Protein ARV - 230 aa

The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that
were correctly predicted in one genomic region but incorrectly fused in another (LOCUS10166).
The combined size (478 + ~241 = ~719) matches NCGR_LOCUS10166 precisely.

Incompatible subcellular localizations

  • HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants
  • ARV1 is an ER membrane protein with multiple transmembrane domains
  • A single polypeptide cannot simultaneously function as a chloroplast stroma enzyme
    and an ER membrane protein

Genome quality caveats

UniProt notes: "The sequence shown here is derived from an EMBL/GenBank/DDBJ whole genome
shotgun (WGS) entry which is preliminary data." Also: "Lacks conserved residue(s) required
for the propagation of feature annotation."

Histidinol Dehydrogenase Domain

Function

HDH catalyzes the final step (step 9/9) of L-histidine biosynthesis:
L-histidinol + 2 NAD+ + H2O -> L-histidine + 2 NADH + 3 H+

Plant Biology

  • Nuclear-encoded, targeted to chloroplast stroma via transit peptide
  • Essential for plant viability; Arabidopsis knockout shows ovule abortion
  • Zinc metalloenzyme
  • Structure solved in related legume Medicago truncatula (PMC5585171)
  • Potential herbicide target (doi:10.1021/acs.jafc.5c01206)
  • Conserved across plants, bacteria, and fungi

Domain Signatures

  • Pfam PF00815 (Histidinol_dh)
  • InterPro IPR012131 (Hstdl_DH)
  • HAMAP MF_01024 (HisD)
  • CDD cd06572 (Histidinol_dh)
  • PANTHER PTHR21256:SF2 (Histidine biosynthesis trifunctional protein)

ARV1 Domain

Function

ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis:
- Sterol uptake, trafficking, and distribution into membranes
- Regulation of sphingolipid metabolism
- Sterol transport from ER to plasma membrane

Plant Biology

  • Arabidopsis expresses two functional ARV isoforms (AtArv1p, AtArv2p) PMID:16725371
  • Both exclusively localized to the endoplasmic reticulum
  • Contain Arv1 homology domain (AHD) with cysteine-rich zinc-binding motif
  • Conserved across yeast, plants, and mammals
  • Loss of function alters sterol distribution and sphingolipid levels

Domain Signatures

  • Pfam PF04161 (Arv1)
  • InterPro IPR007290 (Arv1 family)
  • UniRule RU368065

GO Annotation Analysis

All 17 GO annotations on A0A811MX19 are IEA (electronic), sourced from:
- InterPro2GO mappings (GO_REF:0000002)
- UniProtKB-UniRule (GO_REF:0000104, GO_REF:0000120)
- UniProtKB-SubCell (GO_REF:0000044)
- TreeGrafter/PANTHER (GO_REF:0000118)
- UniProtKB keyword mapping (GO_REF:0000117)

The annotations are internally contradictory because they combine two unrelated proteins:
- Chloroplast stroma + ER membrane localizations are mutually exclusive
- Histidine biosynthesis + sterol transport are unrelated pathways
- Oxidoreductase activity + lipid transport are unrelated functions

Conclusion

NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation.
The two domains should be treated as separate gene products. All electronic annotations are
technically correct for their respective domains but misleading when combined on a single entry.

References

  • Arabidopsis ARV isoforms: PMID:16725371
  • Medicago HDH structure: PMC5585171
  • M. lutarioriparius genome: PMID:33911077
  • HDH as herbicide target: doi:10.1021/acs.jafc.5c01206

📚 Additional Documentation

Notes

(NCGR_LOCUS10166-notes.md)

NCGR_LOCUS10166 Notes

Gene Identity

  • UniProt: A0A811MX19 (unreviewed TrEMBL)
  • Organism: Miscanthus lutarioriparius (NCBITaxon:422564)
  • Named: Protein ARV
  • Length: 719 aa
  • Evidence level: PE 3 (inferred from homology)

Domain Architecture - CRITICAL: Likely Gene Prediction Artifact

This protein contains TWO completely unrelated functional domains:

  1. Histidinol dehydrogenase (HDH) domain (N-terminal, ~aa 1-480)
  2. Pfam: PF00815 (Histidinol_dh)
  3. InterPro: IPR012131, IPR001692, IPR016161
  4. EC 1.1.1.23
  5. Soluble enzyme, chloroplast stroma localized in plants

  6. ARV1 domain (C-terminal, ~aa 480-719)

  7. Pfam: PF04161 (Arv1)
  8. InterPro: IPR007290
  9. Multi-pass ER membrane protein for sterol homeostasis
  10. Contains 2 transmembrane helices (aa 588-607, 619-639)

Evidence this is a gene prediction artifact

STRONG EVIDENCE: The same organism has separate, properly-sized genes for both domains:
- NCGR_LOCUS4558 (A0A811MHP4): Histidinol dehydrogenase, chloroplastic (478 aa) - normal HDH
- NCGR_LOCUS4557 (A0A811MIX4): Protein ARV (230 aa) - normal ARV1

Note that LOCUS4557 and LOCUS4558 are sequential/adjacent loci, strongly suggesting the
gene predictor incorrectly merged two adjacent genes in a different genomic region.

Biological incompatibility: HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER
membrane protein with transmembrane domains. A single protein cannot function in both
compartments simultaneously. The N-terminal transit peptide (for chloroplast import) would
prevent ER membrane insertion, and conversely the transmembrane domains would interfere
with chloroplast import.

Genome quality: The M. lutarioriparius genome is derived from WGS preliminary data
(noted in UniProt CAUTION field). The sequence originated from CAJGYO010000002.

HDH Function (from the real gene product)

Plant histidinol dehydrogenase catalyzes the final step of histidine biosynthesis:
L-histidinol + 2 NAD+ + H2O -> L-histidine + 2 NADH + 3 H+

  • Nuclear-encoded, chloroplast-targeted in plants
  • Immunolocalized to chloroplast stroma [structural studies in Medicago truncatula, PMC5585171]
  • Essential enzyme; knockout causes ovule abortion in Arabidopsis
  • Requires Zn2+ cofactor
  • Potential herbicide target

ARV1 Function (from the real gene product)

ARV1 family proteins mediate sterol homeostasis:
- ER membrane-localized with 2+ transmembrane helices
- Involved in sterol transport from ER to plasma membrane
- Contains Arv1 homology domain (AHD) with cysteine-rich zinc-binding motif
- Arabidopsis has two ARV isoforms (AtArv1p, AtArv2p), both ER-localized PMID:16725371
- Loss of ARV1 causes altered sterol distribution, sphingolipid metabolism changes
- Regulates sphingolipid metabolism

Annotation Implications

All GO annotations on this entry reflect the conflation of two separate gene products:
- HDH-related annotations (GO:0004399, GO:0000105, GO:0051287, GO:0046872, GO:0009570, GO:0005829) come from the HDH domain
- ARV1-related annotations (GO:0032366, GO:0097036, GO:0006665, GO:0016125, GO:0005789, GO:0005783, GO:0016020) come from the ARV1 domain
- Some annotations (GO:0005737, GO:0016491, GO:0016616, GO:0009507) could apply to either

The correct approach would be to split the annotations by domain, but since this is a
single UniProt entry, the review should note that annotations are domain-specific and
that this appears to be a gene model error.

📄 View Raw YAML

id: A0A811MX19
gene_symbol: NCGR_LOCUS10166
product_type: PROTEIN
status: COMPLETE
tags:
- gene-prediction-artifact
- fusion-protein
- plant
taxon:
  id: NCBITaxon:422564
  label: Miscanthus lutarioriparius
description: >-
  NCGR_LOCUS10166 is an unreviewed TrEMBL entry (A0A811MX19) from Miscanthus lutarioriparius
  annotated as "Protein ARV". It is a 719 amino acid protein containing two completely unrelated
  domains: an N-terminal histidinol dehydrogenase (HDH, EC 1.1.1.23) domain and a C-terminal
  ARV1 sterol homeostasis domain. This combination is almost certainly a gene prediction artifact,
  as the same organism encodes properly-sized separate genes for both domains at adjacent loci
  (NCGR_LOCUS4558/A0A811MHP4 for HDH at 478 aa, and NCGR_LOCUS4557/A0A811MIX4 for ARV1 at
  230 aa). The two domains have incompatible subcellular localizations: HDH is a soluble
  chloroplast stroma enzyme while ARV1 is an ER multi-pass membrane protein. All GO annotations
  are IEA-based and reflect the conflation of two separate gene products. The genome sequence
  is derived from preliminary WGS data.

existing_annotations:
- term:
    id: GO:0004399
    label: histidinol dehydrogenase activity
  evidence_type: IEA
  original_reference_id: GO_REF:0000120
  review:
    summary: >-
      Histidinol dehydrogenase activity (EC 1.1.1.23) is correctly predicted for the N-terminal
      HDH domain of this protein based on HAMAP rule MF_01024. The domain contains CDD cd06572
      (Histidinol_dh), Pfam PF00815, and the PROSITE active site PS00611. However, UniProt notes
      this entry "lacks conserved residue(s) required for the propagation of feature annotation,"
      raising questions about catalytic competence. The same organism has a properly-sized separate
      HDH (A0A811MHP4, NCGR_LOCUS4558, 478 aa) that likely represents the true functional HDH gene.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While the HDH domain signature is present in the N-terminal region, this annotation is
      misleading because it is applied to a probable gene prediction artifact. The entry likely
      represents two incorrectly merged adjacent genes. The real HDH in this organism is
      NCGR_LOCUS4558 (A0A811MHP4). Additionally, UniProt flags missing conserved residues
      needed for feature propagation.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        The same M. lutarioriparius genome encodes properly-sized separate proteins for both
        domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.
        The sequential locus numbering (4557 and 4558) indicates these are adjacent genes that
        were correctly predicted in one genomic region but incorrectly fused in another.

- term:
    id: GO:0016491
    label: oxidoreductase activity
  evidence_type: IEA
  original_reference_id: GO_REF:0000002
  review:
    summary: >-
      Oxidoreductase activity is a parent term of histidinol dehydrogenase activity and is
      mapped from InterPro IPR016161 (Ald_DH/histidinol_DH). This is a broad classification
      that applies to the N-terminal HDH domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      This is a very general parent term that adds no specificity beyond what GO:0004399
      (histidinol dehydrogenase activity) already provides. More importantly, it is applied
      to a probable gene prediction artifact. The annotation is technically correct for the
      HDH domain but redundant and misleading in this context.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        NCGR_LOCUS10166 represents a gene model error in the M. lutarioriparius genome annotation.
        The two domains should be treated as separate gene products.

- term:
    id: GO:0016616
    label: oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor
  evidence_type: IEA
  original_reference_id: GO_REF:0000002
  review:
    summary: >-
      This term is mapped from InterPro IPR012131 (Hstdl_DH) and correctly describes the
      reaction mechanism of histidinol dehydrogenase using NAD+ as acceptor. It applies to
      the N-terminal HDH domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While technically correct for the HDH domain, this is an intermediate-specificity term
      that is redundant with the more specific GO:0004399 (histidinol dehydrogenase activity).
      Applied to a probable gene prediction artifact where annotations from two separate
      gene products are conflated.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        All electronic annotations are technically correct for their respective domains but
        misleading when combined on a single entry.

- term:
    id: GO:0046872
    label: metal ion binding
  evidence_type: IEA
  original_reference_id: GO_REF:0000002
  review:
    summary: >-
      Metal ion binding is mapped from InterPro IPR001692 (Histidinol_DH_CS), reflecting
      the zinc cofactor requirement of histidinol dehydrogenase. The ARV1 domain also
      contains a cysteine-rich subdomain with a putative zinc-binding motif. Both domains
      likely bind metal ions but for completely different purposes.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      This is a very general term. For the HDH domain, the more informative annotation would
      be zinc ion binding (GO:0008270) specifically as a catalytic cofactor. For the ARV1
      domain, zinc binding relates to the structural zinc-binding motif in the AHD domain.
      The term is too vague and applied to a probable gene prediction artifact combining
      two unrelated proteins.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with
        transmembrane domains. A single protein cannot function in both compartments
        simultaneously.

- term:
    id: GO:0051287
    label: NAD binding
  evidence_type: IEA
  original_reference_id: GO_REF:0000002
  review:
    summary: >-
      NAD binding is mapped from InterPro IPR012131 (Hstdl_DH) and correctly reflects the
      cofactor requirement of histidinol dehydrogenase, which uses 2 NAD+ molecules per
      catalytic cycle. This applies exclusively to the N-terminal HDH domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While technically correct for the HDH domain, this annotation is applied to a probable
      gene prediction artifact. NAD binding is part of the histidinol dehydrogenase activity
      (GO:0004399) and somewhat redundant. The real HDH gene in this organism is NCGR_LOCUS4558.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa
        represents the true HDH gene.

- term:
    id: GO:0000105
    label: L-histidine biosynthetic process
  evidence_type: IEA
  original_reference_id: GO_REF:0000120
  review:
    summary: >-
      L-histidine biosynthetic process annotation derives from HAMAP rule MF_01024 and
      UniPathway (UPA00031, step 9/9). HDH catalyzes the final step of histidine biosynthesis.
      In plants, this pathway operates in the chloroplast. This annotation applies exclusively
      to the N-terminal HDH domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While the HDH domain is correctly associated with histidine biosynthesis, this annotation
      is on a probable gene prediction artifact. The true histidine biosynthesis HDH in this
      organism is NCGR_LOCUS4558 (A0A811MHP4, 478 aa). UniProt also notes missing conserved
      residues, questioning whether this particular copy is catalytically active even if the
      gene model were correct.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        The same M. lutarioriparius genome encodes properly-sized separate proteins for both
        domains: NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa.

- term:
    id: GO:0006665
    label: sphingolipid metabolic process
  evidence_type: IEA
  original_reference_id: GO_REF:0000104
  review:
    summary: >-
      Sphingolipid metabolic process is assigned via UniRule RU368065 for the ARV1 family.
      ARV1 proteins regulate sphingolipid metabolism in yeast and plants. Arabidopsis ARV
      isoforms (AtArv1p, AtArv2p) are ER-localized and modulate both sterol and sphingolipid
      homeostasis [PMID:16725371]. This annotation applies to the C-terminal ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While this annotation is correct for the ARV1 domain, it is applied to a probable gene
      prediction artifact that conflates two unrelated proteins. The true ARV1 gene in this
      organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.
    - reference_id: PMID:16725371
      supporting_text: >-
        Arv1p is involved in the regulation of cellular lipid homeostasis in the yeast
        Saccharomyces cerevisiae. Here, we report the characterization of the two
        Arabidopsis thaliana ARV genes and the encoded proteins, AtArv1p and AtArv2p.

- term:
    id: GO:0016125
    label: sterol metabolic process
  evidence_type: IEA
  original_reference_id: GO_REF:0000104
  review:
    summary: >-
      Sterol metabolic process is assigned via UniRule RU368065 for the ARV1 family. ARV1
      proteins are mediators of sterol homeostasis, involved in sterol uptake, trafficking,
      and distribution into membranes. This applies to the C-terminal ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1
      in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa). The annotation conflates
      sterol metabolism (ARV1 domain) with histidine biosynthesis (HDH domain) on a single entry.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        ARV1 family proteins are ER membrane proteins that mediate sterol homeostasis:
        sterol uptake, trafficking, and distribution into membranes.

- term:
    id: GO:0032366
    label: intracellular sterol transport
  evidence_type: IEA
  original_reference_id: GO_REF:0000120
  review:
    summary: >-
      Intracellular sterol transport is a core function of ARV1 proteins, which mediate
      sterol transport from ER to plasma membrane. In yeast, ARV1 deletion leads to altered
      intracellular sterol distribution with decreased plasma membrane sterols and elevated
      ER/vacuolar sterols. This applies to the C-terminal ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      Correct for the ARV1 domain but applied to a gene prediction artifact. The true ARV1
      in this organism is NCGR_LOCUS4557 (A0A811MIX4, 230 aa).
    supported_by:
    - reference_id: PMID:16725371
      supporting_text: >-
        Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved
        in the regulation of cellular lipid homeostasis.
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.

- term:
    id: GO:0097036
    label: regulation of plasma membrane sterol distribution
  evidence_type: IEA
  original_reference_id: GO_REF:0000104
  review:
    summary: >-
      Regulation of plasma membrane sterol distribution is assigned via UniRule RU368065.
      This is a specific biological process for ARV1 proteins, which regulate how sterols
      are distributed between the ER and plasma membrane. This applies to the C-terminal
      ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      Correct for the ARV1 domain but applied to a gene prediction artifact combining
      unrelated HDH and ARV1 domains. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.

- term:
    id: GO:0005737
    label: cytoplasm
  evidence_type: IEA
  original_reference_id: GO_REF:0000118
  review:
    summary: >-
      Cytoplasm localization is assigned by TreeGrafter/PANTHER (PTHR21256). This likely
      reflects the HDH domain, as plant HDH is synthesized in the cytoplasm before chloroplast
      import. However, this is a very broad localization term.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      The term is too general. For the HDH domain, the correct localization is chloroplast
      stroma (GO:0009570), which is already annotated. For the ARV1 domain, the correct
      localization is ER membrane (GO:0005789). Applied to a gene prediction artifact.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.
        ARV1 is an ER membrane protein with multiple transmembrane domains.

- term:
    id: GO:0005783
    label: endoplasmic reticulum
  evidence_type: IEA
  original_reference_id: GO_REF:0000117
  review:
    summary: >-
      Endoplasmic reticulum localization is assigned via UniRule RU368065 for the ARV1 family.
      ARV1 proteins are ER-localized membrane proteins. In Arabidopsis, both AtArv1p and
      AtArv2p are exclusively targeted to the ER [PMID:16725371]. This applies to the
      C-terminal ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      Correct for the ARV1 domain but applied to a gene prediction artifact. This localization
      is incompatible with the chloroplast stroma localization of the HDH domain, which is
      the strongest evidence this is an incorrectly merged gene model.
    supported_by:
    - reference_id: PMID:16725371
      supporting_text: >-
        both proteins are exclusively targeted to the endoplasmic reticulum
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        A single polypeptide cannot simultaneously function as a chloroplast stroma enzyme
        and an ER membrane protein.

- term:
    id: GO:0005789
    label: endoplasmic reticulum membrane
  evidence_type: IEA
  original_reference_id: GO_REF:0000120
  review:
    summary: >-
      ER membrane localization derives from UniProtKB-SubCell and reflects the multi-pass
      transmembrane topology of the ARV1 domain (transmembrane helices at aa 588-607 and
      619-639). This applies exclusively to the C-terminal ARV1 domain.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      Correct for the ARV1 domain but applied to a gene prediction artifact. The ER membrane
      localization directly contradicts the chloroplast stroma localization predicted for the
      HDH domain. The true ARV1 gene is NCGR_LOCUS4557 (A0A811MIX4).
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        ARV1 is an ER membrane protein with multiple transmembrane domains. A single protein
        cannot function in both compartments simultaneously.

- term:
    id: GO:0005829
    label: cytosol
  evidence_type: IEA
  original_reference_id: GO_REF:0000118
  review:
    summary: >-
      Cytosol localization is assigned by TreeGrafter/PANTHER. This may reflect the HDH
      domain prior to chloroplast import, or it may be a generic assignment from the
      PANTHER family classification.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      For the HDH domain, the mature protein is chloroplast stroma-localized, not cytosolic.
      For the ARV1 domain, the protein is ER membrane-localized. Cytosol is not the functional
      localization for either domain. Applied to a gene prediction artifact.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        HDH is a soluble enzyme localized to chloroplast stroma in all characterized plants.

- term:
    id: GO:0009507
    label: chloroplast
  evidence_type: IEA
  original_reference_id: GO_REF:0000044
  review:
    summary: >-
      Chloroplast localization derives from UniProtKB-SubCell vocabulary mapping. Plant HDH
      is nuclear-encoded but chloroplast-targeted via a transit peptide. This applies to the
      N-terminal HDH domain. In plants, the entire histidine biosynthesis pathway operates
      in the chloroplast.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While correct for the HDH domain (chloroplast stroma is the more specific term, already
      annotated as GO:0009570), this annotation is on a gene prediction artifact. The true
      HDH gene is NCGR_LOCUS4558 (A0A811MHP4). Additionally, chloroplast localization is
      incompatible with the ER membrane localization of the C-terminal ARV1 domain.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        In plants, HDH is expressed as a nuclear encoded protein precursor which is exported
        to the chloroplast. HDH has been immunolocalized to the chloroplast.

- term:
    id: GO:0009570
    label: chloroplast stroma
  evidence_type: IEA
  original_reference_id: GO_REF:0000118
  review:
    summary: >-
      Chloroplast stroma localization is assigned by TreeGrafter/PANTHER. This is the correct
      subcellular localization for plant HDH, which operates as a soluble enzyme in the
      chloroplast stroma. Structural studies in the related legume Medicago truncatula
      confirm the stroma localization.
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      While this is the most accurate localization for the HDH domain, it is applied to a
      gene prediction artifact. The true HDH gene is NCGR_LOCUS4558 (A0A811MHP4). The
      chloroplast stroma localization is fundamentally incompatible with the ER membrane
      localization of the ARV1 domain on the same polypeptide.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        HDH is a soluble chloroplast stroma enzyme; ARV1 is an ER membrane protein with
        transmembrane domains. A single protein cannot function in both compartments
        simultaneously. The N-terminal transit peptide (for chloroplast import) would
        prevent ER membrane insertion.

- term:
    id: GO:0016020
    label: membrane
  evidence_type: IEA
  original_reference_id: GO_REF:0000104
  review:
    summary: >-
      Membrane localization is assigned via UniRule RU368065 for the ARV1 family, reflecting
      the transmembrane topology of the ARV1 domain. The C-terminal region contains two
      predicted transmembrane helices (aa 588-607 and 619-639).
    action: MARK_AS_OVER_ANNOTATED
    reason: >-
      This is a very general term (parent of ER membrane). While the ARV1 domain is indeed
      a membrane protein, this annotation is on a gene prediction artifact. The more specific
      GO:0005789 (ER membrane) is already annotated. The true ARV1 gene is NCGR_LOCUS4557.
    supported_by:
    - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
      supporting_text: >-
        ARV1 is an ER membrane protein with 2+ transmembrane helices.

references:
- id: GO_REF:0000002
  title: >-
    Gene Ontology annotation through association of InterPro records with GO terms.
  findings: []
- id: GO_REF:0000044
  title: >-
    Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary
    mapping, accompanied by conservative changes to GO terms applied by UniProt.
  findings: []
- id: GO_REF:0000104
  title: >-
    Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping, accompanied by
    conservative changes to GO terms applied by UniProt.
  findings: []
- id: GO_REF:0000117
  title: >-
    Gene Ontology annotation by automatic transfer of UniProtKB UniRule annotation.
  findings: []
- id: GO_REF:0000118
  title: >-
    Gene Ontology annotation by TreeGrafter/PANTHER phylogenetic inference.
  findings: []
- id: GO_REF:0000120
  title: >-
    Gene Ontology annotation by HAMAP-Rule UniProtKB automatic annotation.
  findings: []
- id: PMID:16725371
  title: >-
    Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved in
    the regulation of cellular lipid homeostasis.
  findings:
  - statement: Both Arabidopsis ARV isoforms are exclusively ER-localized
    supporting_text: >-
      both proteins are exclusively targeted to the endoplasmic reticulum
  - statement: ARV proteins contain the bipartite Arv1 homology domain with zinc-binding motif
    supporting_text: >-
      Both Arabidopsis proteins contain the bipartite Arv1 homology domain (AHD), which consists
      of an NH2-terminal cysteine-rich subdomain with a putative zinc-binding motif followed by
      a C-terminal subdomain.
- id: PMID:33911077
  title: >-
    Chromosome-scale assembly and analysis of biomass crop Miscanthus lutarioriparius genome.
  findings:
  - statement: >-
      M. lutarioriparius genome was assembled to chromosome scale from WGS data with contig N50
      of 1.71 Mb covering 96.64% of the genome.
- id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
  title: >-
    Manual deep research on NCGR_LOCUS10166 identifying gene prediction artifact.
  findings:
  - statement: >-
      NCGR_LOCUS10166 is a fusion of two unrelated domains (HDH + ARV1) from incorrectly
      merged adjacent genes.
  - statement: >-
      Separate properly-sized genes exist: NCGR_LOCUS4558 (HDH, 478 aa) and NCGR_LOCUS4557
      (ARV1, 230 aa).
  - statement: >-
      The two domains have incompatible subcellular localizations (chloroplast stroma vs
      ER membrane).

core_functions:
- molecular_function:
    id: GO:0004399
    label: histidinol dehydrogenase activity
  description: >-
    The N-terminal domain (~aa 1-480) of this predicted protein encodes a histidinol
    dehydrogenase that catalyzes the final step of L-histidine biosynthesis (EC 1.1.1.23).
    However, this is almost certainly a gene prediction artifact: the same organism has a
    separate, properly-sized HDH gene (NCGR_LOCUS4558, A0A811MHP4, 478 aa) that is the
    true functional HDH. This entry should not be considered a real gene product.
  directly_involved_in:
  - id: GO:0000105
    label: L-histidine biosynthetic process
  locations:
  - id: GO:0009570
    label: chloroplast stroma
  supported_by:
  - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
    supporting_text: >-
      NCGR_LOCUS4558 (A0A811MHP4) Histidinol dehydrogenase, chloroplastic - 478 aa
      represents the true HDH gene.

- molecular_function:
    id: GO:0032934
    label: sterol binding
  description: >-
    The C-terminal domain (~aa 480-719) encodes an ARV1 family protein involved in sterol
    homeostasis, sterol transport from ER to plasma membrane, and sphingolipid metabolism
    regulation. However, this is almost certainly a gene prediction artifact: the same
    organism has a separate, properly-sized ARV1 gene (NCGR_LOCUS4557, A0A811MIX4, 230 aa)
    that is the true functional ARV1. This entry should not be considered a real gene product.
  directly_involved_in:
  - id: GO:0016125
    label: sterol metabolic process
  - id: GO:0006665
    label: sphingolipid metabolic process
  locations:
  - id: GO:0005789
    label: endoplasmic reticulum membrane
  supported_by:
  - reference_id: PMID:16725371
    supporting_text: >-
      Arabidopsis thaliana expresses two functional isoforms of Arvp, a protein involved
      in the regulation of cellular lipid homeostasis.
  - reference_id: file:9POAL/NCGR_LOCUS10166/NCGR_LOCUS10166-deep-research-manual.md
    supporting_text: >-
      NCGR_LOCUS4557 (A0A811MIX4) Protein ARV - 230 aa represents the true ARV1 gene.

proposed_new_terms: []

suggested_questions:
- question: >-
    Is NCGR_LOCUS10166 a genuine fusion gene or a gene prediction artifact? The presence of
    separate, properly-sized genes for both domains (NCGR_LOCUS4557 for ARV1, NCGR_LOCUS4558
    for HDH) at adjacent loci, combined with incompatible subcellular localizations (chloroplast
    stroma vs ER membrane), strongly suggests a gene model error. RNA-seq or proteomics evidence
    of a full-length 719 aa protein would be needed to support a real fusion.
- question: >-
    Should UniProt flag A0A811MX19 for review or suppression? The entry combines two unrelated
    protein families with contradictory biology. The preliminary WGS-derived genome annotation
    may have incorrectly merged two adjacent reading frames.

suggested_experiments:
- description: >-
    Validate the gene model using RNA-seq data from Miscanthus lutarioriparius to determine
    whether NCGR_LOCUS10166 is transcribed as a single mRNA or represents two separate
    transcripts that were incorrectly merged in the genome annotation.
  hypothesis: >-
    RNA-seq will show two separate transcripts corresponding to the HDH and ARV1 domains,
    confirming this is a gene prediction artifact rather than a genuine fusion gene.
- description: >-
    Compare syntenic regions across related Poaceae genomes (Sorghum bicolor, Saccharum,
    Zea mays) to determine whether HDH and ARV1 genes are consistently found as separate
    adjacent genes, supporting the gene prediction artifact hypothesis.
  hypothesis: >-
    In all related grass genomes, HDH and ARV1 will be encoded as separate genes, confirming
    that the fusion in NCGR_LOCUS10166 is an artifact of genome annotation.