# CAZy family -> GO molecular-function mapping (SSSOM)  -- SEED
#
# Forward-propagation source for the Glycobiology project (see ../GLYCOBIOLOGY.md and
# GLYCOBIOLOGY-resource-reuse.md). Maps a CAZy carbohydrate-active-enzyme FAMILY to the GO
# molecular-function term that best describes its catalytic activity -- the glyco analogue of
# interpro2go / ec2go. CAZy reaches GO only indirectly today (via UniProt/EC/InterPro); there is no
# official cazy2go, so this curated set IS the mapping.
#
# This is a SEED: it covers only the five glycosyltransferase families that the project's exemplar
# genes belong to (each row backed by the corresponding completed gene review). Scaling to the full
# ~190 GT/GH/PL/CE families is the follow-up (seedable from the dbCAN-sub subfamily->EC->substrate
# table, then EC->ec2go).
#
# Predicate semantics (mirrors projects/INTERPRO/interpro2go.sssom.yaml):
#   * skos:exactMatch  -- the GO term holds for ~all (animal) members of the family; family is
#                         mono-specific at this altitude. Ready-to-use family->GO MF.
#   * skos:narrowMatch -- the GO term is NARROWER than the family: it describes the dominant
#                         subfamily but NOT all family members, so the family is too coarse and
#                         SUBFAMILY resolution (dbCAN-sub / EC) is needed before propagating.
#
# Redundancy filter (the "trivial" part): a row is REDUNDANT if the family's EC already maps to the
# same GO term via ec2go (GO_REF:0000003) or an InterPro entry for the family maps to it via
# interpro2go (GO_REF:0000002). Redundant rows add no marginal annotation; the genuine contribution
# is rows whose GO term is NOT reachable that way. Redundancy status is noted per row in `comment`
# (qualitative here; compute live against ec2go closure to confirm, as projects/RHEA does for EC).
#
# Provenance: every CAZy family + EC is from the CAZy/CAZypedia family page; every GO id/label is
# QuickGO-verified non-obsolete; every activity assignment is backed by the cited completed gene
# review in genes/human/<GENE>/.

curie_map:
  CAZy: http://www.cazy.org/
  GO: http://purl.obolibrary.org/obo/GO_
  skos: http://www.w3.org/2004/02/skos/core#
  semapv: https://w3id.org/semapv/vocab/
  sssom: https://w3id.org/sssom/

mapping_set_id: https://w3id.org/ai4curation/ai-gene-review/mappings/cazy2go-seed
mapping_set_title: CAZy family -> GO molecular function (Glycobiology seed)
mapping_set_description: >-
  Curated CAZy-family -> GO molecular-function mappings seeded from the Glycobiology project's
  exemplar glycosyltransferases. exactMatch rows are mono-specific families ready to propagate;
  narrowMatch rows flag families that are too coarse (the GO term applies only to a subfamily, so
  subfamily/EC resolution is required). Intended as a forward-propagation source analogous to
  interpro2go, with redundancy against ec2go/interpro2go to be closure-filtered before use.
license: https://creativecommons.org/licenses/by/4.0/
creator_label:
- AI Gene Review project

mappings:

# ===== exactMatch: mono-specific families, ready-to-use family->GO MF =====

- subject_id: CAZy:GT13
  subject_label: "Glycosyltransferase Family 13 (N-acetylglucosaminyltransferase I, GnT-I)"
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0003827
  object_label: alpha-1,3-mannosylglycoprotein 2-beta-N-acetylglucosaminyltransferase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    GT13 is mono-specific (GnT-I, EC 2.4.1.101) -- the committed step of complex N-glycan synthesis.
    Backed by the MGAT1 (P26572) review (core MF = GO:0003827, exactMatch). LIKELY REDUNDANT with
    ec2go (EC 2.4.1.101 -> GO:0003827 is an established ec2go row); marginal value only where an
    entry lacks the EC.

- subject_id: CAZy:GT65
  subject_label: "Glycosyltransferase Family 65 (protein O-fucosyltransferase)"
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0046922
  object_label: peptide-O-fucosyltransferase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    In animals GT65 = POFUT1 (GDP-fucose protein O-fucosyltransferase 1, EC 2.4.1.221), which adds
    O-fucose to folded EGF repeats in the ER. Backed by the POFUT1 (Q9H488) review. exactMatch for
    metazoa; note GT65 also contains a kinetoplastid alpha-1,2-fucosyltransferase, so restrict to the
    animal POFUT1 subfamily when propagating outside metazoa.

- subject_id: CAZy:GT29
  subject_label: "Glycosyltransferase Family 29 (sialyltransferases)"
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0008373
  object_label: sialyltransferase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    GT29 is the sialyltransferase family (ST3/ST6/ST8 Gal/GalNAc subfamilies, EC 2.4.3.x). The family
    scope == the GO class 'sialyltransferase activity', so exactMatch at the CLASS level -- but the
    specific linkage child (e.g. GO:0003835 alpha-2,6 for ST6GAL1) needs subfamily/EC resolution.
    Backed by the ST6GAL1 (P15907) review. The class term is partly reachable via ec2go from the
    member ECs.

# ===== narrowMatch: poly-specific families -- GO term applies to a SUBFAMILY only =====

- subject_id: CAZy:GT7
  subject_label: "Glycosyltransferase Family 7 (beta-1,4-galactosyl-/N-acetylgalactosaminyltransferases)"
  predicate_id: skos:narrowMatch
  predicate_label: narrow match
  object_id: GO:0003831
  object_label: beta-N-acetylglucosaminylglycopeptide beta-1,4-galactosyltransferase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    narrowMatch: GO:0003831 describes the B4GALT-type members (B4GALT1, EC 2.4.1.38; backed by the
    B4GALT1 P15291 review) but GT7 also contains beta-1,4-N-acetylgalactosaminyltransferases, so the
    term does NOT hold for the whole family. Family alone is too coarse -> resolve to subfamily/EC
    before propagating (the dbCAN-sub subfamily->EC table is the route).

- subject_id: CAZy:GT31
  subject_label: "Glycosyltransferase Family 31 (beta-1,3-galactosyl-/GalNAc-/GlcNAc-transferases, Fringe)"
  predicate_id: skos:narrowMatch
  predicate_label: narrow match
  object_id: GO:0008376
  object_label: acetylgalactosaminyltransferase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    narrowMatch: GT31 is highly heterogeneous (B3GALT galactosyltransferases, B3GALNT
    N-acetylgalactosaminyltransferases, and Fringe beta-1,3-GlcNAc-transferases). GO:0008376 fits the
    B3GALNT subfamily (backed by the B3GALNT2 Q8NCR0 review, which itself proposed a more specific
    protein-O-mannose beta-1,3-GalNAc-T term) but not the galactosyl-/Fringe members -- the strongest
    case in this seed for subfamily resolution.
