ARBA00085883 proteoglycan catabolic process (GO:0030167)

Type: ARBA
Status: COMPLETE
Action: MODIFY
Confidence: 0.60

Description

Rule predicting proteoglycan catabolic process (GO:0030167) based on four condition sets: (1) beta-galactosidase in Eukaryota, (2) alpha-L-iduronidase, (3) beta-hexosaminidase in Mus, and (4) N-acetylglucosamine-6-sulfatase in Metazoa. These enzymes are lysosomal hydrolases involved in sequential degradation of glycosaminoglycan (GAG) chains that constitute proteoglycans. However, beta-galactosidase and beta-hexosaminidase are primarily known for ganglioside degradation, raising questions about annotation specificity.

Analysis Summary

5
Domain Pairs Analyzed
4
Condition Sets
2
Subset Relationships
0
Redundant Annotations

Domain Overlap Analysis Table

Interactive prediction matrix showing how row entries PREDICT column entries. Cell (i,j) shows what fraction of proteins with row domain i also have column domain j. Click cells to view intersection in UniProt. Click domain IDs to view proteins with that domain.

CS 1
Eukaryota
CS 2 CS 3
Mus
CS 4
Metazoa
TGT
Beta-galactosidase
2.60.120.260:FF:000115
(7)
Beta-galactosidase
3.20.20.80:FF:000017
(10)
Alpha-L-iduronidase
2.60.40.10:FF:001526
(3)
Alpha-L-iduronidase
2.60.40.1500:FF:000001
(3)
Alpha-L-iduronidase
3.20.20.80:FF:000059
(3)
Beta-hexosaminidase A
3.20.20.80:FF:000049
(9)
Beta-hexosaminidase subun...
3.30.379.10:FF:000001
(7)
N-acetylglucosamine-6-sul...
3.40.720.10:FF:000012
(4)
proteoglycan catabolic pr...
GO:0030167 []
(55)
CS 1
Eukaryota
Beta-galactosidase
2.60.120.260:FF:000115 (7)
100%
100%
J:70%
(7)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
43%
J:5%
(3)
Beta-galactosidase
3.20.20.80:FF:000017 (10)
70%
J:70%
(7)
100%
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
30%
J:5%
(3)
CS 2 Alpha-L-iduronidase
2.60.40.10:FF:001526 (3)
0%
J:0%
(0)
0%
J:0%
(0)
100%
100%
J:100%
(3)
100%
J:100%
(3)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:5%
(3)
Alpha-L-iduronidase
2.60.40.1500:FF:000001 (3)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:100%
(3)
100%
100%
J:100%
(3)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:5%
(3)
Alpha-L-iduronidase
3.20.20.80:FF:000059 (3)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:100%
(3)
100%
J:100%
(3)
100%
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:5%
(3)
CS 3
Mus
Beta-hexosaminidase A
3.20.20.80:FF:000049 (9)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
78%
J:78%
(7)
0%
J:0%
(0)
78%
J:12%
(7)
Beta-hexosaminidase subunit beta
3.30.379.10:FF:000001 (7)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
J:78%
(7)
100%
0%
J:0%
(0)
86%
J:11%
(6)
CS 4
Metazoa
N-acetylglucosamine-6-sulfatase
3.40.720.10:FF:000012 (4)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
0%
J:0%
(0)
100%
50%
J:4%
(2)
TGT proteoglycan catabolic process
GO:0030167 [] (55)
5%
J:5%
(3)
5%
J:5%
(3)
5%
J:5%
(3)
5%
J:5%
(3)
5%
J:5%
(3)
13%
J:12%
(7)
11%
J:11%
(6)
4%
J:4%
(2)
100%

Legend: Each cell shows PREDICTS % (fraction of row entry proteins that also have column entry - row PREDICTS column), Jaccard similarity (J:%), and intersection count. CS = Condition Set(s), TGT = GO annotation target.

Review Summary

This rule captures four lysosomal glycosidases and sulfatases that participate in proteoglycan catabolism through degradation of glycosaminoglycan chains. Condition set 2 (alpha-L-iduronidase) and condition set 4 (N-acetylglucosamine-6-sulfatase) are strongly appropriate—these enzymes are exclusively or primarily involved in degrading dermatan sulfate, heparan sulfate, and related GAG chains, with well-characterized mucopolysaccharidosis phenotypes when deficient. Condition sets 1 (beta-galactosidase) and 3 (beta-hexosaminidase) are technically correct but biologically problematic—while these enzymes do participate in keratan sulfate degradation, their primary biological roles and clinical significance relate to ganglioside catabolism (GM1 and GM2 respectively). The taxonomic scope is inconsistent and raises questions: condition set 1 uses broad Eukaryota scope, condition set 3 uses narrow Mus restriction (creating false negatives), and condition set 4 uses appropriate Metazoa scope. The rule demonstrates a tension between biochemical accuracy (all four enzymes can cleave bonds in GAG chains) and biological interpretation (two are primarily ganglioside-degrading enzymes with secondary GAG functions). More specific annotations or hierarchical annotation strategies would improve utility.

Action Rationale

The rule requires modification to improve biological accuracy and reduce potential misinterpretation. Three specific issues need addressing: (1) Condition sets 1 (beta-galactosidase) and 3 (beta-hexosaminidase) should either be removed from this rule or receive additional primary annotations reflecting their ganglioside-degrading functions (e.g., "GM1 ganglioside catabolic process" GO:0006689 for GLB1, "ganglioside catabolic process" GO:0006687 or "GM2 ganglioside catabolic process" for hexosaminidase). A hierarchical annotation strategy where proteins receive both primary (ganglioside) and secondary (proteoglycan) annotations would be most accurate. (2) The Mus restriction in condition set 3 is unjustified and creates false negatives—beta-hexosaminidase function in keratan sulfate degradation is conserved across mammals minimally, likely across vertebrates. This scope should be expanded to at least Mammalia or Vertebrata. (3) Consider splitting this rule: create a focused rule for core proteoglycan-degrading enzymes (IDUA, GNS, and other MPS-associated enzymes) separate from dual-function enzymes that also degrade gangliosides. This would improve both precision and biological interpretability while maintaining computational utility.

GO Annotations

GO:0030167 - proteoglycan catabolic process
Aspect: BP

Rule Definition

Condition Sets

Condition Set 1

3 condition(s)
Notes:

Beta-galactosidase (GLB1 in humans) is a lysosomal hydrolase that cleaves terminal beta-linked galactose residues from glycoconjugates. While it does participate in keratan sulfate degradation (a glycosaminoglycan component of proteoglycans), its primary clinical significance relates to GM1 ganglioside catabolism. GLB1 deficiency causes two distinct diseases: GM1 gangliosidosis (impaired ganglioside degradation) and mucopolysaccharidosis type IVB/Morquio B (impaired keratan sulfate degradation). The dual substrate specificity reflects the enzyme's ability to cleave β-galactose from structurally diverse molecules. While annotating GLB1 with proteoglycan catabolic process is technically correct for the keratan sulfate degradation function, this annotation obscures the enzyme's primary role in ganglioside metabolism.

Pairwise Overlap Analysis

Condition A Condition B Count A Count B Intersection Jaccard A in B B in A Interpretation
2.60.120.260:FF:000115 3.20.20.80:FF:000017 7 10 7 0.700 1.000 0.700 SUBSET

Condition Set 2

3 condition(s)
Notes:

Alpha-L-iduronidase (IDUA) is a lysosomal enzyme that cleaves terminal alpha-L-iduronic acid residues from dermatan sulfate and heparan sulfate, both of which are glycosaminoglycan chains attached to proteoglycan core proteins. IDUA deficiency causes mucopolysaccharidosis type I (MPS I), including Hurler syndrome and Scheie syndrome, characterized by accumulation of dermatan sulfate and heparan sulfate in multiple tissues. This is a quintessential proteoglycan catabolic enzyme with no known alternative primary function. The condition set appropriately targets this enzyme family and the GO:0030167 annotation is highly appropriate. Over 201 IDUA mutations have been identified, and enzyme replacement therapy (Aldurazyme) is available for treatment.

Pairwise Overlap Analysis

Condition A Condition B Count A Count B Intersection Jaccard A in B B in A Interpretation
2.60.40.10:FF:001526 2.60.40.1500:FF:000001 3 3 3 1.000 1.000 1.000 REDUNDANT
2.60.40.10:FF:001526 3.20.20.80:FF:000059 3 3 3 1.000 1.000 1.000 REDUNDANT
2.60.40.1500:FF:000001 3.20.20.80:FF:000059 3 3 3 1.000 1.000 1.000 REDUNDANT

Condition Set 3

3 condition(s)
Notes:

Beta-hexosaminidase exists as two isoenzymes in mammals: Hex A (heterodimer of alpha/HEXA and beta/HEXB subunits) and Hex B (homodimer of beta subunits). While beta-hexosaminidase does participate in keratan sulfate degradation by removing terminal N-acetyl-D-galactosamine residues, its primary clinical and biological significance relates to GM2 ganglioside catabolism. HEXA deficiency causes Tay-Sachs disease and HEXB deficiency causes Sandhoff disease, both characterized primarily by GM2 ganglioside accumulation leading to neurodegeneration. The restriction to Mus genus (862507) is puzzling and appears arbitrary—beta-hexosaminidase function in proteoglycan catabolism is conserved across mammals and other vertebrates. This narrow taxonomic scope creates false negatives for the same enzyme function in closely related organisms. Similar to beta-galactosidase, GO:0030167 annotation is technically correct but potentially misleading about the enzyme's primary biological role.

Pairwise Overlap Analysis

Condition A Condition B Count A Count B Intersection Jaccard A in B B in A Interpretation
3.20.20.80:FF:000049 3.30.379.10:FF:000001 9 7 7 0.778 0.778 1.000 SUBSET

Condition Set 4

2 condition(s)
Notes:

N-acetylglucosamine-6-sulfatase (GNS) is a lysosomal enzyme that specifically removes 6-sulfate groups from N-acetylglucosamine residues in heparan sulfate, a glycosaminoglycan component of proteoglycans. GNS deficiency causes Sanfilippo syndrome type D (mucopolysaccharidosis type IIID), a neurodegenerative lysosomal storage disorder characterized by heparan sulfate accumulation. Unlike beta-galactosidase and beta-hexosaminidase, GNS is exclusively involved in glycosaminoglycan catabolism with no known role in ganglioside metabolism. The Metazoa restriction is appropriate and well-supported, as the enzyme functions in heparan sulfate degradation across animals. This is a strong, specific match for proteoglycan catabolic process. GNS represents one of five enzymes (types A-E) involved exclusively in heparan sulfate degradation, and deficiency of any causes distinct Sanfilippo syndrome subtypes.

Assessments

REDUNDANT

The rule uses four condition sets but lacks parsimony in two respects. First, it conflates enzymes with distinct primary biological functions—dedicated GAG-degrading enzymes (IDUA, GNS) versus dual-function enzymes with primary ganglioside-degrading roles (GLB1, Hex A/B). This creates a heterogeneous rule that obscures important functional distinctions. Second, condition set 3 combines two beta-hexosaminidase FunFams with an arbitrary Mus-only restriction, while condition set 1 uses broad Eukaryota scope for structurally similar beta-galactosidase—this inconsistency suggests the taxonomic constraints were not systematically designed. A more parsimonious approach would split the rule into: (a) a focused rule for enzymes primarily involved in proteoglycan catabolism (IDUA, GNS, potentially expanded to other MPS enzymes), and (b) a separate rule or hierarchical annotation strategy for dual-function enzymes. Within each condition set, the use of multiple FunFam signatures for the same enzyme is appropriate for capturing sequence diversity, but the overall rule structure lacks biological parsimony.

STRONG

All four condition sets have extensive experimental support. Alpha-L-iduronidase (condition set 2): Over 201 IDUA mutations documented, enzyme replacement therapy (Aldurazyme) approved, crystal structures solved, dermatan sulfate and heparan sulfate accumulation demonstrated in MPS I patients. N-acetylglucosamine-6-sulfatase (condition set 4): GNS mutations cause Sanfilippo D with specific heparan sulfate accumulation, enzyme assays demonstrate substrate specificity, recombinant enzyme therapy under development. Beta-galactosidase (condition set 1): GLB1 mutations cause distinct phenotypes (GM1 gangliosidosis versus MPS IVB) depending on residual activity toward different substrates, crystal structures explain substrate binding, enzyme assays confirm dual specificity for GM1 ganglioside and keratan sulfate. Beta-hexosaminidase (condition set 3): HEXA and HEXB mutations cause Tay-Sachs and Sandhoff diseases respectively, crystal structures of Hex A and Hex B solved, substrate specificity for GM2 ganglioside and keratan sulfate experimentally validated. The literature strongly supports that all four enzymes can hydrolyze bonds in GAG chains, but also clearly demonstrates that beta-galactosidase and beta-hexosaminidase have primary roles in ganglioside rather than proteoglycan metabolism.

Supporting Evidence:

  • web:https://www.ncbi.nlm.nih.gov/books/NBK532261/: Hurler syndrome is caused by deficiency of alpha-L-iduronidase, resulting in accumulation of dermatan sulfate and heparan sulfate in multiple tissues with progressive deterioration.
  • web:https://pubmed.ncbi.nlm.nih.gov/6450420/: Sanfilippo disease type D is caused by deficiency of N-acetylglucosamine-6-sulfate sulfatase required for heparan sulfate degradation, with accumulation of excessive heparan sulfate.
  • web:https://medlineplus.gov/genetics/gene/glb1/: Beta-galactosidase is involved in metabolism of GM1 ganglioside and keratan sulfate. GLB1-related disorders include GM1 gangliosidosis and mucopolysaccharidosis type IVB with distinct phenotypes.
  • web:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910754/: Crystal structure of human beta-hexosaminidase B explains substrate specificity. HEXB mutations cause Sandhoff disease characterized by GM2 ganglioside accumulation.
  • web:https://www.ncbi.nlm.nih.gov/books/NBK579925/: Degradation of GAG chains occurs via sulfatase-catalyzed removal of terminal sulfate groups and sequential action of exoglycosidases. At least 14 lysosomal storage diseases affect GAG catabolism.
NONE

The four condition sets target distinct enzyme families with no sequence or structural overlap. Condition set 1 targets beta-galactosidase (GLB1, GH35 family), condition set 2 targets alpha-L-iduronidase (IDUA, GH39 family), condition set 3 targets beta-hexosaminidase (HEXA/HEXB, GH20 family), and condition set 4 targets N-acetylglucosamine-6-sulfatase (GNS, sulfatase family). These represent independent evolutionary solutions to specific bond cleavage reactions in GAG degradation. Mechanistically, they cleave different bonds: beta-galactosidase removes terminal beta-galactose, alpha-L-iduronidase removes alpha-L-iduronate, beta-hexosaminidase removes N-acetylhexosamine, and GNS removes 6-sulfate from N-acetylglucosamine. The enzymes act sequentially in lysosomal GAG catabolism pathways but do not overlap in substrate specificity. However, there is functional overlap in the sense that beta-galactosidase and beta-hexosaminidase both have primary roles in ganglioside catabolism, creating a different kind of "overlap" issue—overlap in biological context rather than molecular specificity.

TOO_BROAD

GO:0030167 (proteoglycan catabolic process) shows variable appropriateness across the four condition sets. For condition set 2 (alpha-L-iduronidase) and condition set 4 (N-acetylglucosamine-6-sulfatase), GO:0030167 is highly appropriate and could potentially be made more specific with child terms like "dermatan sulfate catabolic process" (GO:1902556) or "heparan sulfate proteoglycan catabolic process" (GO:0030200). For condition sets 1 (beta-galactosidase) and 3 (beta-hexosaminidase), GO:0030167 is technically accurate but too broad and potentially misleading—it fails to capture the primary biological roles of these enzymes in ganglioside metabolism. More appropriate primary annotations would be "GM1 ganglioside catabolic process" (GO:0006689) for GLB1 and "ganglioside catabolic process" (GO:0006687) or "GM2 ganglioside catabolic process" for hexosaminidase. The current annotation strategy creates a situation where users searching for proteoglycan-degrading enzymes will retrieve GLB1 and hexosaminidase, potentially not understanding that these enzymes' primary functions lie elsewhere. A hierarchical annotation approach with both primary and secondary GO terms would better reflect the biological reality.

TOO_NARROW

The taxonomic scopes across condition sets are inconsistent and in one case appear arbitrary. Condition set 1 uses Eukaryota (NCBITaxon:2759), an extremely broad scope appropriate for the widespread distribution of beta-galactosidase across eukaryotes. Condition set 2 has no taxonomic restriction, which is appropriate given IDUA's conservation across animals and potentially other eukaryotes. Condition set 3 uses Mus (NCBITaxon:862507), restricting to mouse and rat genera—this is unjustifiably narrow and creates false negatives for the same enzyme function in other mammals, vertebrates, and potentially all animals. Beta-hexosaminidase function in keratan sulfate degradation is conserved across mammals at minimum, and the Mus restriction appears arbitrary or erroneous. Condition set 4 uses Metazoa (NCBITaxon:33208), which is appropriate for N-acetylglucosamine-6-sulfatase function in animals. The inconsistency suggests the rule was not designed with systematic consideration of taxonomic distribution. Condition set 3 should be expanded to at least Mammalia (40674) or Vertebrata (7742), and ideally the entire rule should use consistent taxonomic scope logic.

References (9)

Raw YAML

View Source YAML
id: ARBA00085883
description: 'Rule predicting proteoglycan catabolic process (GO:0030167) based on
  four condition sets: (1) beta-galactosidase in Eukaryota, (2) alpha-L-iduronidase,
  (3) beta-hexosaminidase in Mus, and (4) N-acetylglucosamine-6-sulfatase in Metazoa.
  These enzymes are lysosomal hydrolases involved in sequential degradation of glycosaminoglycan
  (GAG) chains that constitute proteoglycans. However, beta-galactosidase and beta-hexosaminidase
  are primarily known for ganglioside degradation, raising questions about annotation
  specificity.'
status: COMPLETE
rule_type: ARBA
rule:
  rule_id: ARBA00085883
  condition_sets:
  - number: 1
    conditions:
    - condition_type: FUNFAM
      value: 2.60.120.260:FF:000115
      curie: CATH.FunFam:2.60.120.260:FF:000115
      label: Beta-galactosidase
      negated: false
    - condition_type: FUNFAM
      value: 3.20.20.80:FF:000017
      curie: CATH.FunFam:3.20.20.80:FF:000017
      label: Beta-galactosidase
      negated: false
    - condition_type: TAXON
      value: '2759'
      curie: NCBITaxon:2759
      label: Eukaryota
      negated: false
    notes: 'Beta-galactosidase (GLB1 in humans) is a lysosomal hydrolase that cleaves
      terminal beta-linked galactose residues from glycoconjugates. While it does
      participate in keratan sulfate degradation (a glycosaminoglycan component of
      proteoglycans), its primary clinical significance relates to GM1 ganglioside
      catabolism. GLB1 deficiency causes two distinct diseases: GM1 gangliosidosis
      (impaired ganglioside degradation) and mucopolysaccharidosis type IVB/Morquio
      B (impaired keratan sulfate degradation). The dual substrate specificity reflects
      the enzyme''s ability to cleave β-galactose from structurally diverse molecules.
      While annotating GLB1 with proteoglycan catabolic process is technically correct
      for the keratan sulfate degradation function, this annotation obscures the enzyme''s
      primary role in ganglioside metabolism.'
    pairwise_overlap:
    - condition_a: 2.60.120.260:FF:000115
      condition_b: 3.20.20.80:FF:000017
      protein_database: SWISSPROT
      count_a: 7
      count_b: 10
      intersection_count: 7
      a_minus_b_count: 0
      b_minus_a_count: 3
      jaccard_similarity: 0.7
      containment_a_in_b: 1.0
      containment_b_in_a: 0.7
      interpretation: SUBSET
  - number: 2
    conditions:
    - condition_type: FUNFAM
      value: 2.60.40.10:FF:001526
      curie: CATH.FunFam:2.60.40.10:FF:001526
      label: Alpha-L-iduronidase
      negated: false
    - condition_type: FUNFAM
      value: 2.60.40.1500:FF:000001
      curie: CATH.FunFam:2.60.40.1500:FF:000001
      label: Alpha-L-iduronidase
      negated: false
    - condition_type: FUNFAM
      value: 3.20.20.80:FF:000059
      curie: CATH.FunFam:3.20.20.80:FF:000059
      label: Alpha-L-iduronidase
      negated: false
    notes: Alpha-L-iduronidase (IDUA) is a lysosomal enzyme that cleaves terminal
      alpha-L-iduronic acid residues from dermatan sulfate and heparan sulfate, both
      of which are glycosaminoglycan chains attached to proteoglycan core proteins.
      IDUA deficiency causes mucopolysaccharidosis type I (MPS I), including Hurler
      syndrome and Scheie syndrome, characterized by accumulation of dermatan sulfate
      and heparan sulfate in multiple tissues. This is a quintessential proteoglycan
      catabolic enzyme with no known alternative primary function. The condition set
      appropriately targets this enzyme family and the GO:0030167 annotation is highly
      appropriate. Over 201 IDUA mutations have been identified, and enzyme replacement
      therapy (Aldurazyme) is available for treatment.
    pairwise_overlap:
    - condition_a: 2.60.40.10:FF:001526
      condition_b: 2.60.40.1500:FF:000001
      protein_database: SWISSPROT
      count_a: 3
      count_b: 3
      intersection_count: 3
      a_minus_b_count: 0
      b_minus_a_count: 0
      jaccard_similarity: 1.0
      containment_a_in_b: 1.0
      containment_b_in_a: 1.0
      interpretation: REDUNDANT
    - condition_a: 2.60.40.10:FF:001526
      condition_b: 3.20.20.80:FF:000059
      protein_database: SWISSPROT
      count_a: 3
      count_b: 3
      intersection_count: 3
      a_minus_b_count: 0
      b_minus_a_count: 0
      jaccard_similarity: 1.0
      containment_a_in_b: 1.0
      containment_b_in_a: 1.0
      interpretation: REDUNDANT
    - condition_a: 2.60.40.1500:FF:000001
      condition_b: 3.20.20.80:FF:000059
      protein_database: SWISSPROT
      count_a: 3
      count_b: 3
      intersection_count: 3
      a_minus_b_count: 0
      b_minus_a_count: 0
      jaccard_similarity: 1.0
      containment_a_in_b: 1.0
      containment_b_in_a: 1.0
      interpretation: REDUNDANT
  - number: 3
    conditions:
    - condition_type: FUNFAM
      value: 3.20.20.80:FF:000049
      curie: CATH.FunFam:3.20.20.80:FF:000049
      label: Beta-hexosaminidase A
      negated: false
    - condition_type: FUNFAM
      value: 3.30.379.10:FF:000001
      curie: CATH.FunFam:3.30.379.10:FF:000001
      label: Beta-hexosaminidase subunit beta
      negated: false
    - condition_type: TAXON
      value: '862507'
      curie: NCBITaxon:862507
      label: Mus
      negated: false
    notes: 'Beta-hexosaminidase exists as two isoenzymes in mammals: Hex A (heterodimer
      of alpha/HEXA and beta/HEXB subunits) and Hex B (homodimer of beta subunits).
      While beta-hexosaminidase does participate in keratan sulfate degradation by
      removing terminal N-acetyl-D-galactosamine residues, its primary clinical and
      biological significance relates to GM2 ganglioside catabolism. HEXA deficiency
      causes Tay-Sachs disease and HEXB deficiency causes Sandhoff disease, both characterized
      primarily by GM2 ganglioside accumulation leading to neurodegeneration. The
      restriction to Mus genus (862507) is puzzling and appears arbitrary—beta-hexosaminidase
      function in proteoglycan catabolism is conserved across mammals and other vertebrates.
      This narrow taxonomic scope creates false negatives for the same enzyme function
      in closely related organisms. Similar to beta-galactosidase, GO:0030167 annotation
      is technically correct but potentially misleading about the enzyme''s primary
      biological role.'
    pairwise_overlap:
    - condition_a: 3.20.20.80:FF:000049
      condition_b: 3.30.379.10:FF:000001
      protein_database: SWISSPROT
      count_a: 9
      count_b: 7
      intersection_count: 7
      a_minus_b_count: 2
      b_minus_a_count: 0
      jaccard_similarity: 0.7777777777777778
      containment_a_in_b: 0.7777777777777778
      containment_b_in_a: 1.0
      interpretation: SUBSET
  - number: 4
    conditions:
    - condition_type: FUNFAM
      value: 3.40.720.10:FF:000012
      curie: CATH.FunFam:3.40.720.10:FF:000012
      label: N-acetylglucosamine-6-sulfatase
      negated: false
    - condition_type: TAXON
      value: '33208'
      curie: NCBITaxon:33208
      label: Metazoa
      negated: false
    notes: N-acetylglucosamine-6-sulfatase (GNS) is a lysosomal enzyme that specifically
      removes 6-sulfate groups from N-acetylglucosamine residues in heparan sulfate,
      a glycosaminoglycan component of proteoglycans. GNS deficiency causes Sanfilippo
      syndrome type D (mucopolysaccharidosis type IIID), a neurodegenerative lysosomal
      storage disorder characterized by heparan sulfate accumulation. Unlike beta-galactosidase
      and beta-hexosaminidase, GNS is exclusively involved in glycosaminoglycan catabolism
      with no known role in ganglioside metabolism. The Metazoa restriction is appropriate
      and well-supported, as the enzyme functions in heparan sulfate degradation across
      animals. This is a strong, specific match for proteoglycan catabolic process.
      GNS represents one of five enzymes (types A-E) involved exclusively in heparan
      sulfate degradation, and deficiency of any causes distinct Sanfilippo syndrome
      subtypes.
  go_annotations:
  - go_id: GO:0030167
    go_label: proteoglycan catabolic process
    aspect: BP
  reviewed_protein_count: 0
  unreviewed_protein_count: 0
  created_date: '2025-03-21'
  modified_date: '2025-03-21'
  entries:
  - id: 2.60.120.260:FF:000115
    type: FUNFAM
    label: Beta-galactosidase
    appears_in_condition_sets:
    - 1
    protein_count: 7
    related_entries:
    - relationship: PREDICTS
      target_id: 3.20.20.80:FF:000017
      containment: 1.0
      jaccard_similarity: 0.7
      intersection_count: 7
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 0.429
      jaccard_similarity: 0.051
      intersection_count: 3
      exclusive_count: 4
  - id: 2.60.40.10:FF:001526
    type: FUNFAM
    label: Alpha-L-iduronidase
    appears_in_condition_sets:
    - 2
    protein_count: 3
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 1.0
      jaccard_similarity: 0.055
      intersection_count: 3
      exclusive_count: 0
  - id: 2.60.40.1500:FF:000001
    type: FUNFAM
    label: Alpha-L-iduronidase
    appears_in_condition_sets:
    - 2
    protein_count: 3
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 1.0
      jaccard_similarity: 0.055
      intersection_count: 3
      exclusive_count: 0
  - id: 3.20.20.80:FF:000017
    type: FUNFAM
    label: Beta-galactosidase
    appears_in_condition_sets:
    - 1
    protein_count: 10
    related_entries:
    - relationship: PREDICTED_BY
      target_id: 2.60.120.260:FF:000115
      containment: 0.7
      jaccard_similarity: 0.7
      intersection_count: 7
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 10
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 0.3
      jaccard_similarity: 0.048
      intersection_count: 3
      exclusive_count: 7
  - id: 3.20.20.80:FF:000049
    type: FUNFAM
    label: Beta-hexosaminidase A
    appears_in_condition_sets:
    - 3
    protein_count: 9
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: PREDICTED_BY
      target_id: 3.30.379.10:FF:000001
      containment: 1.0
      jaccard_similarity: 0.778
      intersection_count: 7
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 9
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 0.778
      jaccard_similarity: 0.123
      intersection_count: 7
      exclusive_count: 2
  - id: 3.20.20.80:FF:000059
    type: FUNFAM
    label: Alpha-L-iduronidase
    appears_in_condition_sets:
    - 2
    protein_count: 3
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 1.0
      jaccard_similarity: 1.0
      intersection_count: 3
      exclusive_count: 0
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 3
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 1.0
      jaccard_similarity: 0.055
      intersection_count: 3
      exclusive_count: 0
  - id: 3.30.379.10:FF:000001
    type: FUNFAM
    label: Beta-hexosaminidase subunit beta
    appears_in_condition_sets:
    - 3
    protein_count: 7
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: PREDICTS
      target_id: 3.20.20.80:FF:000049
      containment: 0.778
      jaccard_similarity: 0.778
      intersection_count: 7
      exclusive_count: 2
    - relationship: EQUIV
      target_id: 3.40.720.10:FF:000012
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 7
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 0.857
      jaccard_similarity: 0.107
      intersection_count: 6
      exclusive_count: 1
  - id: 3.40.720.10:FF:000012
    type: FUNFAM
    label: N-acetylglucosamine-6-sulfatase
    appears_in_condition_sets:
    - 4
    protein_count: 4
    related_entries:
    - relationship: EQUIV
      target_id: 2.60.120.260:FF:000115
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000017
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 2.60.40.10:FF:001526
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 2.60.40.1500:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000059
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 3.20.20.80:FF:000049
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: EQUIV
      target_id: 3.30.379.10:FF:000001
      containment: 0.0
      jaccard_similarity: 0.0
      intersection_count: 0
      exclusive_count: 4
    - relationship: PREDICTS
      target_id: GO:0030167
      containment: 0.5
      jaccard_similarity: 0.035
      intersection_count: 2
      exclusive_count: 2
review_summary: 'This rule captures four lysosomal glycosidases and sulfatases that
  participate in proteoglycan catabolism through degradation of glycosaminoglycan
  chains. Condition set 2 (alpha-L-iduronidase) and condition set 4 (N-acetylglucosamine-6-sulfatase)
  are strongly appropriate—these enzymes are exclusively or primarily involved in
  degrading dermatan sulfate, heparan sulfate, and related GAG chains, with well-characterized
  mucopolysaccharidosis phenotypes when deficient. Condition sets 1 (beta-galactosidase)
  and 3 (beta-hexosaminidase) are technically correct but biologically problematic—while
  these enzymes do participate in keratan sulfate degradation, their primary biological
  roles and clinical significance relate to ganglioside catabolism (GM1 and GM2 respectively).
  The taxonomic scope is inconsistent and raises questions: condition set 1 uses broad
  Eukaryota scope, condition set 3 uses narrow Mus restriction (creating false negatives),
  and condition set 4 uses appropriate Metazoa scope. The rule demonstrates a tension
  between biochemical accuracy (all four enzymes can cleave bonds in GAG chains) and
  biological interpretation (two are primarily ganglioside-degrading enzymes with
  secondary GAG functions). More specific annotations or hierarchical annotation strategies
  would improve utility.'
action: MODIFY
action_rationale: 'The rule requires modification to improve biological accuracy and
  reduce potential misinterpretation. Three specific issues need addressing: (1) Condition
  sets 1 (beta-galactosidase) and 3 (beta-hexosaminidase) should either be removed
  from this rule or receive additional primary annotations reflecting their ganglioside-degrading
  functions (e.g., "GM1 ganglioside catabolic process" GO:0006689 for GLB1, "ganglioside
  catabolic process" GO:0006687 or "GM2 ganglioside catabolic process" for hexosaminidase).
  A hierarchical annotation strategy where proteins receive both primary (ganglioside)
  and secondary (proteoglycan) annotations would be most accurate. (2) The Mus restriction
  in condition set 3 is unjustified and creates false negatives—beta-hexosaminidase
  function in keratan sulfate degradation is conserved across mammals minimally, likely
  across vertebrates. This scope should be expanded to at least Mammalia or Vertebrata.
  (3) Consider splitting this rule: create a focused rule for core proteoglycan-degrading
  enzymes (IDUA, GNS, and other MPS-associated enzymes) separate from dual-function
  enzymes that also degrade gangliosides. This would improve both precision and biological
  interpretability while maintaining computational utility.'
suggested_modifications:
- Remove condition sets 1 and 3 from this rule and create separate rules for ganglioside-degrading
  enzymes with dual substrate specificity
- Alternatively, add hierarchical GO annotations where GLB1 receives both "GM1 ganglioside
  catabolic process" (primary) and "proteoglycan catabolic process" (secondary)
- Add hierarchical GO annotations where beta-hexosaminidase receives both "GM2 ganglioside
  catabolic process" or "ganglioside catabolic process" (primary) and "proteoglycan
  catabolic process" (secondary)
- Expand condition set 3 taxonomic scope from Mus (862507) to at minimum Mammalia
  (40674) or preferably Vertebrata (7742)
- Consider adding other core proteoglycan-degrading enzymes to strengthen the rule
  (iduronate-2-sulfatase/IDS for MPS II, N-acetylgalactosamine-6-sulfatase/GALNS for
  MPS IVA, and other MPS-associated enzymes)
- Add more specific child term annotations where appropriate (heparan sulfate catabolic
  process for GNS, dermatan sulfate catabolic process for IDUA)
- Document the distinction between primary proteoglycan-degrading enzymes and dual-function
  enzymes in rule metadata
parsimony:
  assessment: REDUNDANT
  notes: 'The rule uses four condition sets but lacks parsimony in two respects. First,
    it conflates enzymes with distinct primary biological functions—dedicated GAG-degrading
    enzymes (IDUA, GNS) versus dual-function enzymes with primary ganglioside-degrading
    roles (GLB1, Hex A/B). This creates a heterogeneous rule that obscures important
    functional distinctions. Second, condition set 3 combines two beta-hexosaminidase
    FunFams with an arbitrary Mus-only restriction, while condition set 1 uses broad
    Eukaryota scope for structurally similar beta-galactosidase—this inconsistency
    suggests the taxonomic constraints were not systematically designed. A more parsimonious
    approach would split the rule into: (a) a focused rule for enzymes primarily involved
    in proteoglycan catabolism (IDUA, GNS, potentially expanded to other MPS enzymes),
    and (b) a separate rule or hierarchical annotation strategy for dual-function
    enzymes. Within each condition set, the use of multiple FunFam signatures for
    the same enzyme is appropriate for capturing sequence diversity, but the overall
    rule structure lacks biological parsimony.'
  supported_by:
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK164500/
    supporting_text: 'GLB1-related disorders comprise two phenotypically distinct
      lysosomal storage disorders: GM1 gangliosidosis and mucopolysaccharidosis type
      IVB. This dual disease phenotype reflects the enzyme''s dual substrate specificity.'
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK564432/
    supporting_text: Beta-hexosaminidase A forms part of a complex that breaks down
      GM2 ganglioside. HEXA variants that cause Tay-Sachs disease eliminate beta-hexosaminidase
      A activity, preventing GM2 ganglioside breakdown.
literature_support:
  assessment: STRONG
  notes: 'All four condition sets have extensive experimental support. Alpha-L-iduronidase
    (condition set 2): Over 201 IDUA mutations documented, enzyme replacement therapy
    (Aldurazyme) approved, crystal structures solved, dermatan sulfate and heparan
    sulfate accumulation demonstrated in MPS I patients. N-acetylglucosamine-6-sulfatase
    (condition set 4): GNS mutations cause Sanfilippo D with specific heparan sulfate
    accumulation, enzyme assays demonstrate substrate specificity, recombinant enzyme
    therapy under development. Beta-galactosidase (condition set 1): GLB1 mutations
    cause distinct phenotypes (GM1 gangliosidosis versus MPS IVB) depending on residual
    activity toward different substrates, crystal structures explain substrate binding,
    enzyme assays confirm dual specificity for GM1 ganglioside and keratan sulfate.
    Beta-hexosaminidase (condition set 3): HEXA and HEXB mutations cause Tay-Sachs
    and Sandhoff diseases respectively, crystal structures of Hex A and Hex B solved,
    substrate specificity for GM2 ganglioside and keratan sulfate experimentally validated.
    The literature strongly supports that all four enzymes can hydrolyze bonds in
    GAG chains, but also clearly demonstrates that beta-galactosidase and beta-hexosaminidase
    have primary roles in ganglioside rather than proteoglycan metabolism.'
  supported_by:
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK532261/
    supporting_text: Hurler syndrome is caused by deficiency of alpha-L-iduronidase,
      resulting in accumulation of dermatan sulfate and heparan sulfate in multiple
      tissues with progressive deterioration.
  - reference_id: web:https://pubmed.ncbi.nlm.nih.gov/6450420/
    supporting_text: Sanfilippo disease type D is caused by deficiency of N-acetylglucosamine-6-sulfate
      sulfatase required for heparan sulfate degradation, with accumulation of excessive
      heparan sulfate.
  - reference_id: web:https://medlineplus.gov/genetics/gene/glb1/
    supporting_text: Beta-galactosidase is involved in metabolism of GM1 ganglioside
      and keratan sulfate. GLB1-related disorders include GM1 gangliosidosis and mucopolysaccharidosis
      type IVB with distinct phenotypes.
  - reference_id: web:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910754/
    supporting_text: Crystal structure of human beta-hexosaminidase B explains substrate
      specificity. HEXB mutations cause Sandhoff disease characterized by GM2 ganglioside
      accumulation.
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK579925/
    supporting_text: Degradation of GAG chains occurs via sulfatase-catalyzed removal
      of terminal sulfate groups and sequential action of exoglycosidases. At least
      14 lysosomal storage diseases affect GAG catabolism.
condition_overlap:
  assessment: NONE
  notes: 'The four condition sets target distinct enzyme families with no sequence
    or structural overlap. Condition set 1 targets beta-galactosidase (GLB1, GH35
    family), condition set 2 targets alpha-L-iduronidase (IDUA, GH39 family), condition
    set 3 targets beta-hexosaminidase (HEXA/HEXB, GH20 family), and condition set
    4 targets N-acetylglucosamine-6-sulfatase (GNS, sulfatase family). These represent
    independent evolutionary solutions to specific bond cleavage reactions in GAG
    degradation. Mechanistically, they cleave different bonds: beta-galactosidase
    removes terminal beta-galactose, alpha-L-iduronidase removes alpha-L-iduronate,
    beta-hexosaminidase removes N-acetylhexosamine, and GNS removes 6-sulfate from
    N-acetylglucosamine. The enzymes act sequentially in lysosomal GAG catabolism
    pathways but do not overlap in substrate specificity. However, there is functional
    overlap in the sense that beta-galactosidase and beta-hexosaminidase both have
    primary roles in ganglioside catabolism, creating a different kind of "overlap"
    issue—overlap in biological context rather than molecular specificity.'
go_specificity:
  assessment: TOO_BROAD
  notes: GO:0030167 (proteoglycan catabolic process) shows variable appropriateness
    across the four condition sets. For condition set 2 (alpha-L-iduronidase) and
    condition set 4 (N-acetylglucosamine-6-sulfatase), GO:0030167 is highly appropriate
    and could potentially be made more specific with child terms like "dermatan sulfate
    catabolic process" (GO:1902556) or "heparan sulfate proteoglycan catabolic process"
    (GO:0030200). For condition sets 1 (beta-galactosidase) and 3 (beta-hexosaminidase),
    GO:0030167 is technically accurate but too broad and potentially misleading—it
    fails to capture the primary biological roles of these enzymes in ganglioside
    metabolism. More appropriate primary annotations would be "GM1 ganglioside catabolic
    process" (GO:0006689) for GLB1 and "ganglioside catabolic process" (GO:0006687)
    or "GM2 ganglioside catabolic process" for hexosaminidase. The current annotation
    strategy creates a situation where users searching for proteoglycan-degrading
    enzymes will retrieve GLB1 and hexosaminidase, potentially not understanding that
    these enzymes' primary functions lie elsewhere. A hierarchical annotation approach
    with both primary and secondary GO terms would better reflect the biological reality.
  supported_by:
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK164500/
    supporting_text: MPS IVB is associated with GLB1 variants that impair catalytic
      degradation of keratan sulfate while GM1 gangliosidosis is associated with variants
      impairing ganglioside degradation, indicating the enzyme's dual substrate specificity.
  - reference_id: web:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910754/
    supporting_text: Beta-hexosaminidase processes multiple substrate types including
      sphingolipids, oligosaccharides, and keratan sulfate, but primary clinical significance
      relates to GM2 ganglioside degradation defects causing Tay-Sachs and Sandhoff
      diseases.
taxonomic_scope:
  assessment: TOO_NARROW
  notes: The taxonomic scopes across condition sets are inconsistent and in one case
    appear arbitrary. Condition set 1 uses Eukaryota (NCBITaxon:2759), an extremely
    broad scope appropriate for the widespread distribution of beta-galactosidase
    across eukaryotes. Condition set 2 has no taxonomic restriction, which is appropriate
    given IDUA's conservation across animals and potentially other eukaryotes. Condition
    set 3 uses Mus (NCBITaxon:862507), restricting to mouse and rat genera—this is
    unjustifiably narrow and creates false negatives for the same enzyme function
    in other mammals, vertebrates, and potentially all animals. Beta-hexosaminidase
    function in keratan sulfate degradation is conserved across mammals at minimum,
    and the Mus restriction appears arbitrary or erroneous. Condition set 4 uses Metazoa
    (NCBITaxon:33208), which is appropriate for N-acetylglucosamine-6-sulfatase function
    in animals. The inconsistency suggests the rule was not designed with systematic
    consideration of taxonomic distribution. Condition set 3 should be expanded to
    at least Mammalia (40674) or Vertebrata (7742), and ideally the entire rule should
    use consistent taxonomic scope logic.
  supported_by:
  - reference_id: web:https://medlineplus.gov/genetics/gene/glb1/
    supporting_text: Beta-galactosidase is a conserved lysosomal enzyme found across
      eukaryotes, supporting the Eukaryota scope in condition set 1.
  - reference_id: web:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910754/
    supporting_text: Beta-hexosaminidase A and B are conserved across mammals with
      similar substrate specificity, contradicting the narrow Mus restriction in condition
      set 3.
  - reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK579925/
    supporting_text: At least 14 lysosomal storage diseases affect GAG catabolism
      across animals, supporting broad Metazoa scope for enzymes like GNS in condition
      set 4.
confidence: 0.6
references:
- id: file:rules/arba/ARBA00085883/ARBA00085883-deep-research-falcon.md
  title: Deep research analysis via Falcon (30 citations)
  findings:
  - statement: Beta-galactosidase (GLB1) removes terminal β-galactose from keratan
      sulfate and oligosaccharides in lysosome, contributing to proteoglycan fragment
      trimming, but is multifunctional and primarily associated with GM1 gangliosidosis.
  - statement: Alpha-L-iduronidase (IDUA) is strongly diagnostic for HS/DS catabolism,
      hydrolyzes terminal α-L-iduronic acid, with no prominent alternative functions.
      Deficiency causes MPS I.
  - statement: Beta-hexosaminidase A/B (HEXA/HEXB) primarily involved in ganglioside
      catabolism; has broad specificity for terminal HexNAc but direct role in proteoglycan
      degradation is indirect. Annotation risks overannotation without additional
      constraints.
  - statement: N-acetylglucosamine-6-sulfatase (GNS) is obligatory for HS proteoglycan
      catabolism, removes 6-O-sulfate from terminal GlcNAc. Strongly tied to HS degradation
      with no alternative functions. Deficiency causes MPS IIID.
  - statement: GO:0030167 appropriate for IDUA and GNS but may be too broad for HEXA/HEXB
      without additional evidence. More specific terms recommended when possible (heparan
      sulfate catabolic process, dermatan sulfate catabolic process, keratan sulfate
      catabolic process).
- id: web:https://www.ncbi.nlm.nih.gov/books/NBK532261/
  title: Hurler Syndrome - StatPearls - NCBI Bookshelf
  findings:
  - statement: Hurler syndrome is an autosomal recessive disorder caused by defective
      IDUA gene encoding alpha-L-iduronidase on chromosome 4, resulting in accumulation
      of dermatan sulfate and heparan sulfate in multiple tissues.
- id: web:https://pubmed.ncbi.nlm.nih.gov/6450420/
  title: 'Sanfilippo disease type D: deficiency of N-acetylglucosamine-6-sulfate sulfatase'
  findings:
  - statement: Sanfilippo syndrome type D is caused by N-acetylglucosamine-6-sulfate
      sulfatase deficiency required for heparan sulfate degradation, with excessive
      heparan sulfate accumulation in patient fibroblasts.
- id: web:https://medlineplus.gov/genetics/gene/glb1/
  title: 'GLB1 gene: MedlinePlus Genetics'
  findings:
  - statement: Beta-galactosidase encoded by GLB1 is involved in metabolism of GM1
      ganglioside and keratan sulfate. Deficiency causes GM1 gangliosidosis or mucopolysaccharidosis
      type IVB depending on substrate specificity of variants.
- id: web:https://www.ncbi.nlm.nih.gov/books/NBK164500/
  title: GLB1-Related Disorders - GeneReviews
  findings:
  - statement: 'GLB1-related disorders comprise two phenotypically distinct diseases:
      GM1 gangliosidosis (impaired ganglioside degradation) and MPS IVB (impaired
      keratan sulfate degradation), reflecting dual substrate specificity.'
- id: web:https://www.ncbi.nlm.nih.gov/books/NBK564432/
  title: Tay-Sachs Disease - StatPearls
  findings:
  - statement: Beta-hexosaminidase A breaks down GM2 ganglioside in lysosomes. HEXA
      variants causing Tay-Sachs disease eliminate enzyme activity, causing GM2 accumulation
      and progressive neuronal damage.
- id: web:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910754/
  title: Crystal Structure of Human β-Hexosaminidase B
  findings:
  - statement: Crystal structure explains substrate specificity of beta-hexosaminidase
      B (HEXB homodimer) for GM2 ganglioside, oligosaccharides, and keratan sulfate.
      Mutations cause Sandhoff disease.
- id: web:https://www.ncbi.nlm.nih.gov/books/NBK579925/
  title: Proteoglycans and Sulfated Glycosaminoglycans - Essentials of Glycobiology
  findings:
  - statement: GAG degradation occurs via sulfatase-catalyzed sulfate removal and
      sequential exoglycosidase action. At least 14 lysosomal storage diseases affect
      GAG catabolism, including mucopolysaccharidoses.
- id: web:https://www.ncbi.nlm.nih.gov/books/NBK544295/
  title: Biochemistry, Glycosaminoglycans - StatPearls
  findings:
  - statement: Glycosaminoglycans include heparan sulfate, dermatan sulfate, chondroitin
      sulfate, and keratan sulfate. Sequential degradation requires specific lysosomal
      enzymes; deficiencies cause mucopolysaccharidoses.
supported_by:
- reference_id: web:https://www.ncbi.nlm.nih.gov/books/NBK579925/
  supporting_text: All four enzyme families targeted by this rule participate in the
    sequential degradation of glycosaminoglycan chains that constitute proteoglycans,
    validating the biochemical accuracy of the GO:0030167 annotation.