Pfam2GO vs InterPro2GO — Precision-Gap Analysis
Auto-generated by
analyze_pfam_go_gaps.py. Do not edit by hand; re-run the script to refresh. See parent project.
Provenance
pfam2goversion date: 2026/04/28 13:48:58interpro2goversion date: 2026/04/28 13:48:58interpro.xml.gzInterPro release: 109.0 (11-JUN-26)go-basic.obo: current release at run time (current.geneontology.org).- Specificity is judged over the GO
is_a+part_ofDAG (go-basic). - Note on release skew: the GO mapping files come from the GO snapshot and the membership file from the EBI InterPro FTP; if their dates differ, a few Pfam families re-integrated in the newer InterPro release land in entries the older GO snapshot has not annotated yet. These surface as
DISJOINT_PARENT_NO_GOand are skew, not signal.
Headline
Of 9,871 pfam2go assertions on integrated families, 9,844 are byte-identical to a GO id already on the parent InterPro entry and 0 are more specific. pfam2go is, as its own header states, generated from InterPro2GO — it provides no precision gain over InterPro2GO.
Coverage
- Pfam families with at least one
pfam2goterm: 5,224 - Of these, integrated into an InterPro entry: 5,221
- Not integrated into any InterPro entry (pure InterPro2GO gap): 3
- Pfam families mapped to an InterPro entry overall (membership file): 29,105
Classification of pfam2go assertions for integrated families
Each pfam2go (Pfam, GO) assertion compared to the GO terms of the Pfam family's parent InterPro entry:
| Category | Assertions | Meaning |
|---|---|---|
| SAME | 9,844 | identical GO id already on the InterPro entry |
| MORE_SPECIFIC | 0 | GO descendant of an InterPro-entry term (would be a precision gain) |
| MORE_GENERAL | 1 | GO ancestor of an InterPro-entry term (Pfam less specific) |
| DISJOINT_PARENT_HAS_GO | 1 | unrelated to the entry's terms, entry does have GO (genuine difference) |
| DISJOINT_PARENT_NO_GO | 25 | parent entry has no GO at all — InterPro release-skew artifact |
- Distinct integrated Pfam families with ≥1 MORE_SPECIFIC term: 0
- Distinct families with a genuine disjoint difference (parent has GO): 1
- Distinct families showing release-skew disjointness (parent has no GO): 13
- Unintegrated Pfam families contributing GO terms InterPro entries lack: 3 (3 assertions)
Examples — increased precision (MORE_SPECIFIC)
Pfam family maps to a GO term strictly more specific than anything on its parent InterPro entry. These would be the gap-filling candidates.
None found — there is no case where pfam2go is more specific than its parent InterPro entry.
Examples — Pfam less specific than InterPro (MORE_GENERAL)
The reverse of the hypothesis: InterPro2GO carries the more precise term.
| Pfam | InterPro | Pfam GO (general) | InterPro has descendant |
|---|---|---|---|
| PF08214 | IPR016849 | GO:0004402 histone acetyltransferase activity | GO:0010484 histone H3 acetyltransferase activity |
Examples — genuine disjoint difference (DISJOINT_PARENT_HAS_GO)
Parent InterPro entry has GO terms, but the pfam2go term is unrelated to them. The few cases here are typically a generic process term paired with a more specific InterPro molecular-function term for the same family.
| Pfam | InterPro | Pfam GO (unrelated to entry's terms) |
|---|---|---|
| PF08214 | IPR016849 | GO:0006355 regulation of DNA-templated transcription |
Examples — release-skew disjointness (DISJOINT_PARENT_NO_GO)
Parent InterPro entry has no GO at all (newly created entry the GO snapshot predates). pfam2go merely retains the older term; not a precision gain, but a small recall advantage until InterPro2GO catches up.
| Pfam | InterPro (no GO yet) | Pfam GO (retained) |
|---|---|---|
| PF01513 | IPR064509 | GO:0006741 NADP+ biosynthetic process |
| PF02346 | IPR063475 | GO:0019031 viral envelope |
| PF02346 | IPR063475 | GO:0019064 fusion of virus membrane with host plasma membrane |
| PF03588 | IPR063612 | GO:0008914 leucyl-tRNA--protein transferase activity |
| PF03588 | IPR063612 | GO:0030163 protein catabolic process |
| PF04258 | IPR064098 | GO:0016020 membrane |
| PF04258 | IPR064098 | GO:0042500 aspartic endopeptidase activity, intramembrane cleaving |
| PF04350 | IPR061922 | GO:0043107 type IV pilus-dependent motility |
| PF04350 | IPR061922 | GO:0043683 type IV pilus assembly |
| PF04612 | IPR061921 | GO:0015627 type II protein secretion system complex |
| PF04612 | IPR061921 | GO:0015628 protein secretion by the type II secretion system |
| PF06213 | IPR063680 | GO:0009236 cobalamin biosynthetic process |
| PF08551 | IPR061914 | GO:0006890 retrograde vesicle-mediated transport, Golgi to endoplasmic reticulum |
| PF08551 | IPR061914 | GO:0016020 membrane |
| PF10156 | IPR063477 | GO:0003712 transcription coregulator activity |
| PF10156 | IPR063477 | GO:0006357 regulation of transcription by RNA polymerase II |
| PF10156 | IPR063477 | GO:0016592 mediator complex |
| PF10272 | IPR063936 | GO:0016020 membrane |
| PF10272 | IPR063936 | GO:0016567 protein ubiquitination |
| PF10272 | IPR063936 | GO:0061630 ubiquitin protein ligase activity |
| PF10995 | IPR063588 | GO:0035438 cyclic-di-GMP binding |
| PF17659 | IPR061920 | GO:0003697 single-stranded DNA binding |
| PF17825 | IPR063559 | GO:0000712 resolution of meiotic recombination intermediates |
| PF17825 | IPR063559 | GO:0000794 condensed nuclear chromosome |
| PF17825 | IPR063559 | GO:0016887 ATP hydrolysis activity |
Examples — unintegrated Pfam families with GO terms
These Pfam families carry pfam2go terms but are not part of any InterPro entry, so InterPro2GO has no equivalent.
| Pfam | Pfam GO |
|---|---|
| PF04715 | GO:0009058 biosynthetic process |
| PF06009 | GO:0007155 cell adhesion |
| PF13929 | GO:0048255 mRNA stabilization |
Output files
pfam_go_precision_gaps.tsv— the non-SAME classified assertions (the findings; the ~9,800 identical SAME rows are omitted)unintegrated_pfam_with_go.tsv— pfam2go terms for unintegrated familiesdata/pfam_go_all_assertions.tsv— full dump incl. SAME rows (git-ignored, regenerable)