Headroom for New, More-Specific Pfam → GO Mappings

Headroom for New, More-Specific Pfam → GO Mappings

Auto-generated by headroom_analysis.py. Re-run to refresh. See parent project. This measures the opportunity to author new per-Pfam mappings; it does not invent any mapping.

Provenance

A. Coverage gap — Pfam families with no GO via InterPro

A new Pfam→GO mapping for these families adds annotation where InterPro2GO currently provides nothing.

These are the pure-coverage candidates (full list: data/pfam_no_go_coverage_gap.tsv, git-ignored). Examples with informative descriptions:

Pfam name reason description
PF00007 Cys_knot entry_has_no_go Cystine-knot domain
PF00008 EGF entry_has_no_go EGF-like domain
PF00011 HSP20 entry_has_no_go Hsp20/alpha crystallin family
PF00017 SH2 entry_has_no_go SH2 domain
PF00021 UPAR_LY6 entry_has_no_go u-PAR/Ly-6 domain
PF00022 Actin entry_has_no_go Actin
PF00024 PAN_1 entry_has_no_go PAN domain
PF00026 Asp entry_has_no_go Eukaryotic aspartyl protease
PF00027 cNMP_binding entry_has_no_go Cyclic nucleotide-binding domain
PF00029 Connexin entry_has_no_go Connexin
PF00030 Crystall entry_has_no_go Beta/Gamma crystallin
PF00035 dsrm entry_has_no_go Double-stranded RNA binding motif
PF00037 Fer4 entry_has_no_go 4Fe-4S binding domain
PF00038 Filament entry_has_no_go Intermediate filament protein
PF00040 fn2 entry_has_no_go Fibronectin type II domain
PF00043 GST_C entry_has_no_go Glutathione S-transferase, C-terminal domain
PF00045 Hemopexin entry_has_no_go Hemopexin
PF00047 ig entry_has_no_go Immunoglobulin domain
PF00051 Kringle entry_has_no_go Kringle domain
PF00052 Laminin_B entry_has_no_go Laminin B (Domain IV)

(Note: a large share of uncovered families are domains of unknown function (DUFs); those are genuine knowledge gaps, not mapping gaps.)

B. Splitting headroom — lumped entries sharing one GO term

InterPro entries with ≥2 Pfam members and ≥1 GO term: every member inherits the same GO term, so functionally distinct members are candidates for more specific (descendant) per-Pfam terms.

Entry-type breakdown of multi-Pfam GO-bearing entries:

InterPro entry type entries
Domain 67
Repeat 4
Family 4
Conserved_site 1

Highest-headroom entries (most Pfam members, most general shared GO)

Family / Homologous-superfamily entries with many members and a general GO term — the best candidates for per-Pfam refinement. Member descriptions show whether the lumped families are functionally heterogeneous.

InterPro type members GO depth shared GO example member families
IPR010392 Family 2 1 structural molecule activity; viral capsid TNV_CP; Potex_coat
IPR004031 Family 2 2 membrane PMP22_Claudin; Claudin_2
IPR006628 Family 2 5 RNA polymerase II transcription regulatory region sequence-specific DNA binding; purine-rich negative regulatory element binding PurA; DUF3276
IPR002494 Family 2 7 keratin filament Keratin_B2; Keratin_B2_2

Full ranked list: lumped_entries_headroom.tsv.

Interpretation