UniPathway Unique Terms Project

MATURE PIPELINEFLAGSHIP

UniPathway Unique Terms Project

Overview

This project reviews GO annotations derived from UniPathway vocabulary mapping
(GO_REF:0000041) where UniPathway provides the only non-redundant support for a
gene-term pair. It follows the same principle used for the SPKW project: a source
is only considered uniquely informative if the annotation is not already supported
by another source for the exact term, or by another source for a more specific
descendant term.

UniPathway-derived annotations are mostly biological-process pathway assertions.
Unlike SPKW, the strongest signal is not broad eukaryotic process conflation; it is
a mix of useful pathway gap-filling, broad parent-process assertions, and a small
number of source-pathway mapping errors where a pathway bucket is too coarse for a
specific gene product.

GO_REF:0000041 describes UniPathway as an archived, no-longer-maintained pathway
vocabulary. That makes this a legacy-source audit: useful annotations should be
retained, but broad or stale mappings should not be treated as automatically
current.

Key Findings

Methods

This project used local GOA DuckDB databases under ~/repos/go-db/db/.
UniPathway annotations were counted from non-negated GO_REF:0000041 rows.

A row was treated as TRUE unique only when no non-UniPathway source already
supported the same gene-term pair, and no non-UniPathway source supported a more
specific descendant term for the same gene. This prevents parent pathway terms
from looking novel when another source already supports a more specific process.

The nitrogen-cycle audit used the GO:0071941 nitrogen cycle metabolic process
closure. Exemplar rows were then reviewed through the gene-review workflow:
fetch gene data, edit the review YAML, validate, render the gene page, and link
the reviewed result from this project page.

For the pilot rows below, the YAML reviews now include explicit per-annotation
reasons and supported_by evidence. Where direct species-level experiments were
not available, the review states the inference basis: enzyme name and EC,
UniProt pathway/reaction text, InterPro/NCBIfam/PANTHER family placement, locus
context, or broader family literature.

Deep Research Highlights

The deep-research pass was most useful for checking whether broad family-level
summaries matched the specific propagated node, subfamily, or independent
enzyme evidence for each gene:

Cross-Organism Scan

Database UniPathway annotations UniPathway genes TRUE unique annotations TRUE unique genes TRUE unique terms
human (goa_human.ddb) 1,129 1,066 247 243 57
mouse (goa_mouse.ddb) 1,110 1,053 231 223 57
MGI (mgi.ddb) 1,110 1,053 227 220 57
RGD (rgd.ddb) 967 915 242 234 57
Xenopus (goa_xenopus.ddb) 1,687 1,611 484 462 54
ZFIN (zfin.ddb) 690 656 213 202 50
WormBase (wb.ddb) 476 459 123 118 41
ANOGA (ANOGA.ddb) 279 270 75 73 38
SGD (sgd.ddb) 551 524 124 116 41
PomBase (pombase.ddb) 386 373 69 67 24
Brachypodium (BRADI.ddb) 819 790 228 223 48
wheat (goa_wheat.ddb) 2,697 2,620 750 738 50
PSEPK (PSEPK.ddb) 426 395 51 45 30
bacteria aggregate 1,594,828 1,462,204 165,344 144,244 204
Pseudomonadota aggregate 667,184 611,258 66,978 60,017 154
Clostridium aggregate 94,078 86,152 9,210 8,039 88
fungi aggregate 233,164 223,751 51,093 48,303 117
archaea aggregate 33,403 30,381 3,140 2,615 63
virus aggregate 285 278 140 135 25

Broader Species and Clade Patterns

Scope Dominant TRUE unique terms Initial review implication
Plants: Brachypodium and wheat protein ubiquitination, pectin biosynthesis/catabolism, cinnamic acid biosynthesis, UDP-D-xylose biosynthesis, sucrose metabolism Plant scans reproduce the eukaryotic ubiquitination bucket, but also add plant-specific cell-wall and phenylpropanoid pathway terms that look more source-specific than the vertebrate lipid-parent terms.
Amphibian: Xenopus protein ubiquitination, cAMP catabolism, heparin proteoglycan biosynthesis, UMP/CTP salvage, fatty acid beta-oxidation Broadly resembles human and mouse. The top list also contains obsolete L-histidine catabolism terms, so Xenopus should be checked for stale GO mappings before accepting pathway rows.
Bacteria and Pseudomonadota oxidative phosphorylation, tetrahydrofolate biosynthesis, thiamine diphosphate biosynthesis, histidine catabolism, malonyl-CoA biosynthesis, Kdo2-lipid A biosynthesis Strongest positive signal. These terms are often close to operon/pathway membership and are good candidates for ACCEPT when gene-level function matches the pathway step.
Clostridium aggregate urea cycle plus core anaerobic amino-acid, cofactor, and energy-metabolism pathways Useful for clade-specific microbial metabolism, but nitrogen-related parent terms need pathway-context review rather than automatic environmental-process interpretation.
Fungi and archaea large numbers of true-unique metabolic-pathway rows, including nitrogen-related subsets Useful as a bridge between eukaryotic pathway assertions and microbial metabolism. Fungal edge cases should be checked carefully when the process is better known from bacteria or archaea.
Viruses protein ubiquitination, nucleotide biosynthesis/salvage, translational elongation, DNA modification Small but nontrivial unique set. Viral ubiquitination and nucleotide-process annotations should be reviewed for whether the viral product executes the process or modulates a host pathway.

Targeted Term Group: Nitrogen Cycle

The nitrogen-cycle audit used the GO:0071941 nitrogen cycle metabolic process
closure. In the scanned databases, UniPathway contributes rows for
denitrification pathway, nitrate assimilation, urea catabolic process, and
urea cycle. It did not contribute rows for nitrogen fixation or nitrogenase
activity terms in the scanned datasets, so this source is not currently filling
nitrogenase annotation gaps.

Scope GO term UniPathway rows TRUE unique rows TRUE unique taxa Initial assessment
bacteria GO:0019333 denitrification pathway 615 615 530 High-priority microbial set, but review by gene role. Enzymes such as nirK1, nirK2, and nosZ are strong positives; regulator rows such as norR1 and norR2 can be over-annotated if mapped directly to the pathway.
Pseudomonadota GO:0019333 denitrification pathway 528 528 446 Strong clade-specific signal; most bacterial denitrification true-unique rows are in this aggregate.
archaea GO:0019333 denitrification pathway 19 19 19 Plausible but smaller archaeal set; review gene context before accepting, especially for haloarchaeal rows.
fungi GO:0019333 denitrification pathway 1 1 1 Edge case. Fungal denitrification is biologically plausible in some taxa, but a singleton UniPathway row should be reviewed with high scrutiny.
bacteria GO:0042128 nitrate assimilation 2,767 14 14 Mostly redundant after closure filtering. The remaining true-unique examples include nrfA and related cytochrome nitrite reductases, which may indicate a mapping problem rather than assimilatory nitrate use.
fungi GO:0042128 nitrate assimilation 556 0 0 No unique UniPathway signal after closure filtering.
PSEPK GO:0042128 nitrate assimilation 1 0 0 Not a useful unique term in the P. putida pilot.
bacteria GO:0043419 urea catabolic process 6,688 44 33 Low unique fraction but likely useful for urease subunits when the gene product is directly part of urea hydrolysis.
archaea GO:0043419 urea catabolic process 130 10 9 Stronger targeted signal. Examples include ureC1, ureC2 in Nitrososphaera viennensis and ureC in other ammonia-oxidizing or nitrogen-metabolizing archaea.
bacteria GO:0000050 urea cycle 1,189 1,189 1,149 High-volume unique set, but not automatically an environmental nitrogen-cycling annotation. Review as arginine/ornithine/carbamoyl-phosphate pathway context.
fungi GO:0000050 urea cycle 790 682 646 Similar broad pathway-context issue; likely useful for core nitrogen/amino-acid metabolism but not necessarily a nitrogen-cycle headline.
Clostridium GO:0000050 urea cycle 49 49 49 Clade-specific positive candidates, subject to the same term-scope check.

Nitrogen-Cycle Pilot Assessments

Example UniPathway term Assessment
Rhodopseudomonas palustris CGA009 nirK1, nirK2, nosZ GO:0019333 denitrification pathway ACCEPT for all three reviewed genes. nirK1 and nirK2 are copper NO-forming nitrite reductases; nosZ is nitrous-oxide reductase. The support is mainly conserved enzyme/family evidence plus UniProt pathway mapping, not direct strain-specific biochemistry. The family research is useful here because it separates true NirK/NosZ pathway enzymes from broader multicopper-oxidase and CuA-like parent families.
Cupriavidus necator H16 norR1, norR2 GO:0019333 denitrification pathway MARK_AS_OVER_ANNOTATED for both reviewed genes. Falcon deep research plus primary NorR literature support an NO-responsive sigma-54 transcription activator role; the generic ATP-binding, regulation, and DNA-binding annotations were refined toward ATP hydrolysis, positive transcriptional regulation, and GO:0141097 ligand-modulated transcription activator activity.
Ferroglobus placidus Ferp_0128 GO:0019333 denitrification pathway ACCEPT. PANTHER places the archaeal protein in the nitrous-oxide reductase subfamily, and the UniProt EC/pathway text matches terminal denitrification. The cytochrome-c oxidase parent annotation was removed in favor of nitrous-oxide reductase activity after reviewing the NosZ/Cox2 family boundary.
Nitrososphaera viennensis EN76 ureC1, ureC2 GO:0043419 urea catabolic process ACCEPT for both reviewed genes. Both are urease catalytic/alpha-subunit proteins supported by EC 3.5.1.5 plus TIGR01792/PTHR43440 urease-family placement, so urea catabolism is direct reaction context. The reviews keep metal-binding statements tied to available evidence because rare iron ureases make blanket nickel propagation risky.
Desulfotalea psychrophila nrfA GO:0042128 nitrate assimilation REMOVE. nrfA is an ammonia-forming cytochrome c nitrite reductase; the better MF is GO:0042279 nitrite reductase (cytochrome, ammonia-forming) activity. The UniPathway nitrate-assimilation assertion does not match this dissimilatory nitrite-reduction enzyme, and the cytochrome c-552 family report confirms that broad family names should not be used as a shortcut here.
Polytolypa hystricis AJ80_06654 singleton fungal row GO:0019333 denitrification pathway UNDECIDED. The family evidence supports copper NO-forming nitrite reductase activity, but the fungal denitrification pathway assertion remains a singleton automated UniPathway row without organism-specific pathway evidence. Multicopper-oxidase family breadth makes this a pathway-context question, not just an enzyme-family question.

Plant Pathway Pilot Assessments

Example UniPathway term Assessment
Brachypodium distachyon BRADI_1g22147v3 GO:0045490 pectin catabolic process ACCEPT. The protein is a predicted pectate lyase, supported by EC 4.2.2.2 and PANTHER pectate lyase family placement, so the pathway term is direct cell-wall pectin-degradation context.
Brachypodium distachyon LOC100829928 GO:0009800 cinnamic acid biosynthetic process ACCEPT. The gene encodes phenylalanine ammonia-lyase; InterPro Phe_NH3-lyase plus the PAL reaction support direct trans-cinnamate biosynthesis rather than only a broad ammonia-lyase-family call.
Brachypodium distachyon BRADI_1g66227v3 GO:0033320 UDP-D-xylose biosynthetic process ACCEPT. The gene encodes UDP-glucuronate decarboxylase; PANTHER PTHR43078:SF51 and NAD(P)-binding domains support the one-step UDP-D-xylose biosynthesis annotation. Falcon found no direct BRADI_1g66227v3 paper, so the pathway call is orthology/family-supported and the Golgi localization is treated as predicted rather than experimentally established.

Top Human TRUE UniPathway-Unique Terms

GO term Label Unique genes Initial assessment
GO:0016567 protein ubiquitination 124 Useful, but only with explicit substrate-adaptor and UBL boundaries
GO:0016024 CDP-diacylglycerol biosynthetic process 10 Generally useful lipid-pathway gap filling
GO:0042572 retinol metabolic process 9 Generally useful for retinoid enzymes, but broad for xenobiotic CYPs
GO:0006198 cAMP catabolic process 8 Good for cAMP phosphodiesterases
GO:0008203 cholesterol metabolic process 6 Mixed; direct for steroidogenic components, broad for drug-metabolizing CYPs
GO:0006631 fatty acid metabolic process 6 Often correct but broad
GO:0019563 glycerol catabolic process 5 Correct for glycerol kinase/dehydrogenase pathway genes
GO:0016925 protein sumoylation 4 Mixed; inspect SUMO vs ubiquitin specificity
GO:0030210 heparin proteoglycan biosynthetic process 4 Correct pathway context for NDST enzymes
GO:0006506 GPI anchor biosynthetic process 4 Correct for GPI transamidase components

Protein Ubiquitination Boundary Review

UniPathway itself helps explain why this term is hard. UPA00143 is not just a
generic label; it has three child pathways for the ubiquitination cascade:

UniPathway term Scope
UPA00831 protein ubiquitination (ubiquitin activation) E1-dependent ubiquitin activation
UPA00832 protein ubiquitination (E2 ubiquitin transfer) transfer from E1 to E2
UPA00833 protein ubiquitination (ubiquitin transfer) E3-mediated transfer to substrates, with E3s described as substrate-recognition modules

That source boundary supports including E1, E2, and E3 machinery in
GO:0016567, and it also explains why substrate-recognition proteins can be
legitimate positives. It does not settle every GO curation boundary, because GO
annotations are made to individual gene products rather than to the pathway as a
whole.

The key boundary problem is therefore how broadly GO:0016567 should be read
when projected from the UniPathway parent:

Interpretation Curation consequence
Catalysis only Too narrow. It would incorrectly exclude substrate receptors such as SOCS4, SOCS5, KCTD11, and ZSWIM8 that directly recruit substrates to E3 ligase complexes.
Any regulatory connection to ubiquitin-dependent degradation Too broad. It would include upstream signaling regulators, ubiquitin-binding cargo handlers, deubiquitinases, and non-ubiquitin UBL enzymes.
Direct ubiquitination machinery or direct substrate recruitment Best boundary for these UniPathway rows. Accept GO:0016567 when the gene product is part of the ubiquitin-transfer machinery or directly brings a substrate together with ubiquitin ligase machinery.

Under this boundary, GO:0016567 remains useful as a biological-process
grouping even when we add molecular-function adaptor terms. The BP term says
the gene product is directly involved in the ubiquitination process; the MF term
says what it does in that process. For non-catalytic substrate receptors, the
preferred core MF is GO:1990756 ubiquitin-like ligase-substrate adaptor
activity, not a generic binding term and not an E3 catalytic activity term.

This means protein ubiquitination is not redundant with existing regulation
annotations such as positive regulation of proteasomal ubiquitin-dependent
protein catabolic process. The regulation terms describe the outcome or pathway
effect. GO:1990756 describes the substrate-adaptor activity. GO:0016567
describes direct participation in the ubiquitin-conjugation process.

Human GO:0016567 Triage

The 124 human TRUE UniPathway-unique GO:0016567 rows divide into the following
curation buckets:

Bucket Count Decision rule Examples
Substrate receptor or substrate adaptor 63 Usually ACCEPT for GO:0016567; add or prefer GO:1990756 for the core MF when evidence supports direct substrate-ligase bridging. ASB family, DCAF family, FBXO family, KCTD11, SOCS1, SOCS2, SOCS3, SOCS4, SOCS5, SOCS7
Catalytic or probable E3 ligase 45 ACCEPT; prefer catalytic MF terms such as GO:0061630 when present. BRCA1, RAD18, RNF2, SYVN1, TRIM38
E2 enzyme 1 ACCEPT; use E2 conjugating activity as core MF. UBE2Z
Non-ubiquitin UBL specificity 1 MODIFY to the specific UBL process. UBA7 to GO:0032020 ISG15-protein conjugation
Deubiquitinase 1 Usually MODIFY or REMOVE unless there is separate ubiquitin-transfer evidence. USP9Y
Ubiquitin-binding or cargo-handling protein 1 Usually not ACCEPT for GO:0016567 without direct transfer-machinery evidence; use binding, cargo transport, or catabolic-process terms as appropriate. HDAC6
Regulatory or complex-module inspect bucket 12 Inspect manually; accept direct CRL modules, but do not accept solely from pathway membership or downstream effects. ELOB, APPBP2, ARMC5, CDCA3, MED8, TULP4, ZBTB16

Re-review Outcomes

Gene set Boundary class GO:0016567 outcome Additional action
BRCA1, RAD18, SYVN1, PEX2, PEX10, PEX12 Catalytic E3 ligases or E3 ligase complex subunits with direct substrate ubiquitination evidence ACCEPT No adaptor MF needed for catalytic E3 cases; keep more specific monoubiquitination/polyubiquitination annotations where present.
KCTD11, ZSWIM8 Substrate adaptors already carrying the adaptor MF ACCEPT Keep GO:1990756 as the core MF.
SOCS4, SOCS5 Substrate-recognition SOCS-box adaptors ACCEPT Added GO:1990756 as NEW; changed core MF from substrate-binding-only terms to adaptor activity.
ELOB, ELOC Elongin BC module proteins ACCEPT Clarified that these are direct CRL modules but not substrate receptors, so GO:1990756 is not added. ELOB is in the TRUE UniPathway-unique scan; ELOC is a local non-unique comparator with both UniPathway and experimental rows.
UBA7 ISG15-specific E1 activating enzyme MODIFY Replace GO:0016567 with GO:0032020 ISG15-protein conjugation.

Reviewed Exemplars

Gene Organism UniPathway term Action Assessment
SOCS4 human GO:0016567 protein ubiquitination ACCEPT Useful unique annotation. SOCS4 acts as a SOCS-box substrate adaptor recruiting Elongin/Cullin ubiquitin ligase machinery to targets such as EGFR. Re-review added GO:1990756 ubiquitin-like ligase-substrate adaptor activity as the core MF.
SOCS5 human GO:0016567 protein ubiquitination ACCEPT Same boundary as SOCS4. SOCS5 is a substrate-recognition component of an Elongin BC-CUL2/5-SOCS-box E3 ligase complex; re-review added GO:1990756.
KCTD11 human GO:0016567 protein ubiquitination ACCEPT Useful unique annotation. KCTD11 functions as a CRL3 substrate adaptor and promotes HDAC1 ubiquitination/degradation. This shows that non-catalytic E3-complex adaptors can still be validly involved in ubiquitination.
ZSWIM8 human GO:0016567 protein ubiquitination ACCEPT Substrate-adaptor positive control. Existing review already has GO:1990756, matching the boundary used for SOCS4, SOCS5, and KCTD11.
ELOB, ELOC human GO:0016567 protein ubiquitination ACCEPT Direct CRL2/CRL5 module examples. These support the ubiquitination process but are not themselves substrate receptors, so the re-review explicitly does not add GO:1990756.
BRCA1, RAD18, SYVN1, PEX2, PEX10, PEX12 human GO:0016567 protein ubiquitination ACCEPT Catalytic E3 or E3-complex positive controls. The UniPathway process term is broad but correct and should coexist with more specific ubiquitin ligase, monoubiquitination, and polyubiquitination terms.
UBA7 human GO:0016567 protein ubiquitination MODIFY Mapping error. UBA7 is an ISG15-activating E1 enzyme; the correct process is GO:0032020 ISG15-protein conjugation, not generic protein ubiquitination. This is the clearest human pilot example where UPA00143 overgeneralizes a ubiquitin-like modifier pathway.
COX5B human GO:0006119 oxidative phosphorylation ACCEPT Useful unique annotation. COX5B is a structural Complex IV subunit, so UniPathway adds correct OXPHOS context beyond component/localization annotations.
GK5 human GO:0019563 glycerol catabolic process KEEP_AS_NON_CORE Correct but not core. GK5 phosphorylates glycerol to glycerol-3-phosphate; the pathway term captures catabolic context, while the core review is better centered on glycerol kinase activity and glycerol-3-phosphate biosynthesis.
PM20D1 human GO:0006631 fatty acid metabolic process KEEP_AS_NON_CORE Correct substrate context but broad. PM20D1 synthesizes/hydrolyzes N-fatty-acyl amino acids; it uses fatty acids but is not primarily a fatty-acid catabolic or biosynthetic enzyme.
LPCAT1 human GO:0006644 phospholipid metabolic process MARK_AS_OVER_ANNOTATED Broad parent term. The gene is better represented by specific phospholipid remodeling and phosphatidic-acid/phosphatidylcholine-related processes already present in the review.
algE P. putida KT2440 GO:0042121 alginic acid biosynthetic process ACCEPT Useful unique bacterial annotation. AlgE is an outer-membrane alginate pore/export component; the pathway term correctly captures its alginate production role.
catA P. putida KT2440 GO:0042952 beta-ketoadipate pathway ACCEPT Useful unique bacterial pathway annotation. CatA catalyzes catechol ring cleavage in the beta-ketoadipate pathway, and the UniPathway row captures the core aromatic-catabolism pathway context.

Pattern Analysis

1. Metabolic Enzyme in Pathway - Usually ACCEPT

Examples: COX5B in oxidative phosphorylation, catA in beta-ketoadipate pathway,
algE in alginate biosynthesis.

These are the strongest UniPathway successes. The source supplies a pathway-level
biological-process annotation that may be missing from InterPro, PAINT, or direct
manual curation, while staying close to the enzyme or complex role.

2. Protein Modification Buckets - Inspect Mechanistic Role

Examples: SOCS4, SOCS5, KCTD11, ZSWIM8, ELOB, ELOC, UBA7.

UPA00143 is valuable for E3 ligases, direct E3-complex modules, and
substrate-adaptor proteins, but it needs the boundary rules above. The key
curation split is catalytic ubiquitin-transfer activity versus substrate-adaptor
activity versus non-ubiquitin UBL specificity.

3. Broad Lipid Parent Terms - Often KEEP_AS_NON_CORE or MARK_AS_OVER_ANNOTATED

Examples: PM20D1, LPCAT1.

UniPathway often captures the broad lipid class correctly, but the broad term may
not be the best curated endpoint when specific lipid remodeling, acyltransferase,
or N-acyl-amino-acid chemistry is available.

4. Bacterial Pathway Coverage - Strong Positive Signal

The PSEPK pilot resembles the positive-control side of the SPKW bacterial work.
Top TRUE unique terms include beta-ketoadipate pathway, oxidative phosphorylation,
thiamine diphosphate biosynthesis, tetrahydrofolate biosynthesis, histidine
catabolism, and LPS/heptose biosynthesis. These generally map directly to operon
or enzyme-pathway membership and look curatorially useful.

5. Term-Group Audits - Better Than Source-Wide Sampling

The nitrogen-cycle scan shows why UniPathway should be reviewed by biologically
coherent term groups, not only by source-wide top terms. Denitrification is a
high-confidence microbial target, while nitrate assimilation is mostly redundant
and its remaining true-unique cases include likely non-assimilatory nrfA rows.
Urea-cycle terms are high volume but need careful scope interpretation in microbes.

6. Pathway Regulators - Do Not Treat as Pathway Enzymes

Examples: norR1, norR2.

UniPathway can correctly record that a regulator is associated with a pathway, but
the GO biological-process annotation may become too direct if the row is projected
as involved_in denitrification pathway. NorR1 and NorR2 regulate nitric oxide
reductase expression; they should not be curated as denitrification-pathway
enzymes.

Curation Recommendations

  1. Keep closure filtering as the default. Naive UniPathway-only queries overstate
    novelty by counting rows already supported by exact or more specific terms from
    other sources.
  2. Prioritize UPA00143 review. It is the dominant human unique source and has
    both good positives and clear UBL-specific mapping errors.
  3. Treat bacterial metabolic pathway rows as high-value candidates. They often
    add real pathway context absent from other GOA sources.
  4. Downgrade broad parent lipid terms when specific processes exist. Use
    KEEP_AS_NON_CORE, MARK_AS_OVER_ANNOTATED, or MODIFY depending on whether
    the broad term is merely contextual or actually misleading.
  5. Check modifier identity before accepting protein-modification terms. Do not
    assume that a UniPathway protein-modification bucket is specific enough for GO.
  6. Use targeted term-group audits for microbial processes. For nitrogen-cycle
    terms, denitrification and urease-linked urea catabolism are higher-value review
    targets than nitrate assimilation, which is mostly redundant or potentially
    mis-mapped in the true-unique tail.
  7. Separate pathway enzymes from pathway regulators. For UniPathway rows on
    transcriptional regulators, inspect whether the source is asserting direct
    pathway membership or only regulatory context.

Follow-Up Targets

Target Rationale
Human UPA00143 audit 124 TRUE unique genes; likely highest yield for both gap filling and error cleanup, using the protein-ubiquitination boundary rules above
Human retinol/cholesterol CYP subset Several true-unique rows may be broad xenobiotic/steroid context rather than core pathway function
PSEPK beta-ketoadipate set Positive-control set for validating UniPathway as bacterial pathway gap filler
Yeast protein glycosylation set Top SGD unique term (GO:0006486, 24 genes); likely useful but should be checked for specificity
Bacterial denitrification set 615 TRUE unique bacterial rows across 530 taxa; likely highest-yield microbial nitrogen-cycle target
Bacterial nrfA nitrate-assimilation tail 14 TRUE unique rows; likely mapping-risk set where GO:0042128 may not match dissimilatory nitrite reduction biology
Archaeal urease set 10 TRUE unique urea catabolic process rows across 9 taxa; good targeted archaea exemplar set
Plant cell-wall pathway set Brachypodium and wheat true-unique pectin and UDP-D-xylose rows; useful plant-specific contrast to vertebrate ubiquitination
Cupriavidus NorR regulator set Denitrification true-unique rows can over-annotate regulators as pathway enzymes; useful negative-control set

Project Status

Technical Appendix

The scan uses a closure-filtered uniqueness query. The important curation point is
that exact duplicate support and more-specific descendant support are both removed
before a UniPathway row is called uniquely informative.

Closure-filtering SQL used for source-wide counts
WITH upa AS (
    SELECT internal_id, db_object_id, db_object_symbol, ontology_class_ref, aspect
    FROM gaf_association
    WHERE supporting_references = 'GO_REF:0000041'
      AND is_negation = false
),
true_unique AS (
    SELECT u.*
    FROM upa u
    WHERE NOT EXISTS (
        SELECT 1
        FROM gaf_association o
        WHERE o.db_object_id = u.db_object_id
          AND o.ontology_class_ref = u.ontology_class_ref
          AND o.supporting_references != 'GO_REF:0000041'
          AND o.is_negation = false
    )
    AND NOT EXISTS (
        SELECT 1
        FROM isa_partof_closure ipc
        JOIN gaf_association o
          ON o.db_object_id = u.db_object_id
         AND o.ontology_class_ref = ipc.subject
         AND o.supporting_references != 'GO_REF:0000041'
         AND o.is_negation = false
        WHERE ipc.object = u.ontology_class_ref
    )
)
SELECT count(*) AS annotations,
       count(DISTINCT db_object_id) AS genes,
       count(DISTINCT ontology_class_ref) AS terms
FROM true_unique;