# InterPro -> GO mapping review (SSSOM) -- proposed edits to interpro2go
#
# These rows review the InterPro2GO (https://www.ebi.ac.uk/GOA/InterPro2GO, GO_REF:0000002) mappings
# of the InterPro entries deep-researched in projects/INTERPRO (see INTERPRO.md and the per-entry
# interpro/interpro/<IPR>/<IPR>-deep-research-falcon.md reports). Each subject InterPro entry carries
# the listed GO term via interpro2go; this set records, per (entry, GO term), whether that mapping is
# sound, over-broad, or should be removed.
#
# Three predicate classes encode the verdict:
#   * skos:exactMatch                     -- the GO term holds for essentially ALL proteins the entry
#                                            matches; the interpro2go mapping is sound (ACCEPT).
#   * skos:broadMatch                     -- the mapping is retained but the GO term is broader than /
#                                            over-annotates part of the entry's membership; recommend
#                                            demoting to a catalytically/functionally homogeneous child
#                                            entry or treating as non-core at the gene level
#                                            (MODIFY / KEEP_AS_NON_CORE / MARK_AS_OVER_ANNOTATED).
#   * skos:exactMatch + predicate_modifier: Not
#                                         -- the mapping is factually incorrect for the entry and is
#                                            proposed for REMOVAL from interpro2go (REMOVE).
#
# Provenance: every subject InterPro id, entry name, type, and GO-term set is from the cached
# interpro/interpro/<IPR>/<IPR>-metadata.yaml (InterPro API); every verdict + rationale is from the
# corresponding falcon/Edison family deep-research report. GO ids/labels are validated non-obsolete
# against the GO oak config. Validate with:  just validate-interpro-mappings
#   (linkml-validate against the SSSOM schema 'mapping set' class, then GO-label term validation of the
#   generated interpro2go.terms.yaml).

mapping_set_id: https://w3id.org/ai4curation/ai-gene-review/mappings/interpro2go-review
mapping_set_title: InterPro2GO mapping review (deep-researched families)
mapping_set_description: >-
  Curated review of the InterPro2GO mappings for the InterPro entries deep-researched in
  projects/INTERPRO. exactMatch rows endorse the interpro2go mapping; broadMatch rows retain it but
  flag it as over-broad for a structurally/functionally heterogeneous entry (demote to child entries
  or treat as non-core); exactMatch rows with predicate_modifier Not propose removing a factually
  incorrect mapping. Each verdict is backed by a family-level deep-research report
  (interpro/interpro/<IPR>/<IPR>-deep-research-falcon.md). Seeds recommendations to InterPro2GO
  curators and quality-checks GO annotation of proteins matched by these signatures.
license: https://creativecommons.org/licenses/by/4.0/
creator_label:
- AI Gene Review project
mapping_date: "2026-06-20"
subject_source: InterPro
object_source: GO
curie_map:
  InterPro: https://www.ebi.ac.uk/interpro/entry/InterPro/
  GO: http://purl.obolibrary.org/obo/GO_
  skos: http://www.w3.org/2004/02/skos/core#
  semapv: https://w3id.org/semapv/vocab/
  sssom: https://w3id.org/sssom/

mappings:

# ===== IPR000719 Protein kinase domain (type: domain) -- REMOVE both (pseudokinases) =====
- subject_id: InterPro:IPR000719
  subject_label: Protein kinase domain
  predicate_id: skos:exactMatch
  predicate_label: exact match
  predicate_modifier: Not
  object_id: GO:0005524
  object_label: ATP binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    REMOVE. IPR000719 is a domain signature that also matches pseudokinases; Class I pseudokinases
    bind neither ATP nor metal, so ATP binding is not universal across the entry. Restrict to catalytic
    child entries with intact ATP-binding motifs. (IPR000719 deep research.)
- subject_id: InterPro:IPR000719
  subject_label: Protein kinase domain
  predicate_id: skos:exactMatch
  predicate_label: exact match
  predicate_modifier: Not
  object_id: GO:0006468
  object_label: protein phosphorylation
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    REMOVE at the domain level. Pseudokinases (~10% of the human kinome) lack catalytic residues and
    do not phosphorylate; some are repurposed (AMPylation, glutamylation). Restrict to catalytic child
    families and use specific MF terms (GO:0004674 / GO:0004713) rather than this BP term on a domain.

# ===== IPR001128 Cytochrome P450 (type: family) -- cofactor binding sound; activity over-broad =====
- subject_id: InterPro:IPR001128
  subject_label: Cytochrome P450
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0020037
  object_label: heme binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    ACCEPT. Heme (heme-thiolate) coordination is the defining, universally conserved feature of the
    P450 fold. (IPR001128 deep research.)
- subject_id: InterPro:IPR001128
  subject_label: Cytochrome P450
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0005506
  object_label: iron ion binding
  mapping_justification: semapv:ManualMappingCuration
  comment: ACCEPT. The central heme iron is conserved across all P450s.
- subject_id: InterPro:IPR001128
  subject_label: Cytochrome P450
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0004497
  object_label: monooxygenase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    Over-broad. The family spans 819+ functionally diverse subfamilies (incl. non-monooxygenase
    activities and catalytically inert members); monooxygenase activity over-annotates at the family
    level. Demote to functionally homogeneous child entries.
- subject_id: InterPro:IPR001128
  subject_label: Cytochrome P450
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0016705
  object_label: oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen
  mapping_justification: semapv:ManualMappingCuration
  comment: Over-broad for the same reason as monooxygenase activity; demote to child entries.

# ===== IPR001424 Cu/Zn SOD domain (type: domain) =====
- subject_id: InterPro:IPR001424
  subject_label: Superoxide dismutase, copper/zinc binding domain
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0046872
  object_label: metal ion binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    KEEP_AS_NON_CORE. Generic and weakly informative; domain presence does not guarantee canonical
    Cu/Zn binding. (IPR001424 deep research.)
- subject_id: InterPro:IPR001424
  subject_label: Superoxide dismutase, copper/zinc binding domain
  predicate_id: skos:exactMatch
  predicate_label: exact match
  predicate_modifier: Not
  object_id: GO:0006801
  object_label: superoxide metabolic process
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    REMOVE from this domain-level entry. A BP term on a structural module; copper-only SODs and
    copper-chaperone members (e.g. CCS) carry the domain but do not dismutate superoxide, so the
    process term mis-annotates divergent members. Map process terms only onto bona fide SOD child
    entries.

# ===== IPR000276 GPCR, rhodopsin-like (type: family) =====
- subject_id: InterPro:IPR000276
  subject_label: G protein-coupled receptor, rhodopsin-like
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0004930
  object_label: G protein-coupled receptor activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    Over-broad. Atypical chemokine receptors (ACKRs) and some orphan receptors carry the Class A
    signature but lack canonical G-protein coupling (signal via arrestin), so GPCR activity
    over-annotates these exceptions. (IPR000276 deep research.)
- subject_id: InterPro:IPR000276
  subject_label: G protein-coupled receptor, rhodopsin-like
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0007186
  object_label: G protein-coupled receptor signaling pathway
  mapping_justification: semapv:ManualMappingCuration
  comment: Over-broad for the same ACKR/orphan exceptions as GPCR activity.
- subject_id: InterPro:IPR000276
  subject_label: G protein-coupled receptor, rhodopsin-like
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0016020
  object_label: membrane
  mapping_justification: semapv:ManualMappingCuration
  comment: KEEP_AS_NON_CORE. Universally true but generic; uninformative as a core annotation.

# ===== IPR001046 NRAMP / SLC11 metal transporter (type: family) -- broad terms sound =====
- subject_id: InterPro:IPR001046
  subject_label: NRAMP family
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0046873
  object_label: metal ion transmembrane transporter activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    ACCEPT as a broad family-level term. All members are integral-membrane transition-metal
    transporters; do not add more specific terms (e.g. iron transport) at this level, as substrate
    specificity diverges by subfamily. (IPR001046 deep research.)
- subject_id: InterPro:IPR001046
  subject_label: NRAMP family
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0030001
  object_label: metal ion transport
  mapping_justification: semapv:ManualMappingCuration
  comment: ACCEPT as a broad family-level process term.
- subject_id: InterPro:IPR001046
  subject_label: NRAMP family
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0016020
  object_label: membrane
  mapping_justification: semapv:ManualMappingCuration
  comment: KEEP_AS_NON_CORE. Trivially true but informationally useless as a core annotation.

# ===== IPR012724 Chaperone DnaJ (type: family) =====
- subject_id: InterPro:IPR012724
  subject_label: Chaperone DnaJ
  predicate_id: skos:exactMatch
  predicate_label: exact match
  predicate_modifier: Not
  object_id: GO:0005524
  object_label: ATP binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    REMOVE. Factually incorrect: ATP binding/hydrolysis is a property of the Hsp70 partner; J-domain
    proteins use the HPD motif to stimulate Hsp70 ATPase and lack an ATP-binding pocket. No evidence
    supports universal ATP binding by DnaJ family members. (IPR012724 deep research.)
- subject_id: InterPro:IPR012724
  subject_label: Chaperone DnaJ
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0006457
  object_label: protein folding
  mapping_justification: semapv:ManualMappingCuration
  comment: ACCEPT. J-domain proteins are co-chaperones that promote client folding; broadly appropriate.
- subject_id: InterPro:IPR012724
  subject_label: Chaperone DnaJ
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0009408
  object_label: response to heat
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    KEEP_AS_NON_CORE / demote. Overgeneralizes from heat-inducible canonical members to a functionally
    heterogeneous family (many JDPs are constitutive or organelle-specific). Restrict to stress-
    responsive subfamilies.

# ===== IPR007197 Radical SAM (type: domain) -- both sound despite genericity =====
- subject_id: InterPro:IPR007197
  subject_label: Radical SAM
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0003824
  object_label: catalytic activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    ACCEPT. Notable case: this is the root MF term and maximally generic, yet it is the CORRECT
    family-level annotation. Radical SAM members catalyze mechanistically diverse reactions (>100
    families), so no specific catalytic term is universal; the conserved property really is just "is a
    catalytic enzyme". Replacing with e.g. oxidoreductase activity would over-annotate. Optionally
    KEEP_AS_NON_CORE. (IPR007197 deep research.)
- subject_id: InterPro:IPR007197
  subject_label: Radical SAM
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0051536
  object_label: iron-sulfur cluster binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    ACCEPT. The [4Fe-4S] cluster that reductively cleaves SAM is the defining, universally conserved
    cofactor of the domain; substantially more informative than generic metal ion binding.

# ===== IPR020849 Small GTPase, Ras-type (type: family) =====
- subject_id: InterPro:IPR020849
  subject_label: Small GTPase, Ras-type
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0005525
  object_label: GTP binding
  mapping_justification: semapv:ManualMappingCuration
  comment: ACCEPT. The universal, defining molecular function of the family. (IPR020849 deep research.)
- subject_id: InterPro:IPR020849
  subject_label: Small GTPase, Ras-type
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0003924
  object_label: GTPase activity
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    PROPOSED ADD -- absent from interpro2go (annotation gain). The GTP-hydrolysis machinery (P-loop,
    Switch II catalytic Gln61, Mg2+ coordination) is described as universal and essential across
    Ras-type small GTPases; the entry maps GTP binding but not the catalytic hydrolysis it defines.
    Intrinsic hydrolysis is slow and GAP-accelerated, so flag for curator confirmation before adding.
- subject_id: InterPro:IPR020849
  subject_label: Small GTPase, Ras-type
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0007165
  object_label: signal transduction
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    Over-broad. True but too generic; assign specific child terms (e.g. GO:0007265 Ras protein signal
    transduction) at subfamily entries with pathway evidence rather than this broad BP term family-wide.
- subject_id: InterPro:IPR020849
  subject_label: Small GTPase, Ras-type
  predicate_id: skos:broadMatch
  predicate_label: broad match
  object_id: GO:0016020
  object_label: membrane
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    MARK_AS_OVER_ANNOTATED. Generic; specific membrane-compartment terms need subfamily evidence, and
    not all Ras-type members are membrane-associated. Assign compartment terms to child entries instead.

# ===== IPR002100 Transcription factor, MADS-box (type: domain) -- both sound =====
- subject_id: InterPro:IPR002100
  subject_label: Transcription factor, MADS-box
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0003677
  object_label: DNA binding
  mapping_justification: semapv:ManualMappingCuration
  comment: >-
    ACCEPT. Sequence-specific DNA binding is intrinsic to the MADS-box domain. Note: do NOT add
    DNA-binding transcription factor activity (GO:0003700) here -- the report finds TF/regulatory
    function is a WHOLE-PROTEIN property arising from the K/C domains and complex context, not the
    MADS domain itself, so it would over-annotate the domain entry. (IPR002100 deep research.)
- subject_id: InterPro:IPR002100
  subject_label: Transcription factor, MADS-box
  predicate_id: skos:exactMatch
  predicate_label: exact match
  object_id: GO:0046983
  object_label: protein dimerization activity
  mapping_justification: semapv:ManualMappingCuration
  comment: ACCEPT. MADS-box proteins bind DNA as dimers; dimerization is domain-intrinsic.
