Human Proteostasis Network

SCOPING BIOLOGY_DOMAINFLAGSHIP

Species: human

Human Proteostasis Network Project

Bottom line

We are using the Human Proteostasis Network (PN) Annotation 4.3.11 workbook
(3,123 genes, 4,000 role assignments across 9 branches) as a scaffold,
prioritization layer, and QA source
for GO curation — not as an annotation set
to import. The workbook contains no GO IDs; it is a PN-native
Branch/Class/Group/Type/Subtype taxonomy that overlaps GO inconsistently.

So far the project has:

The deliverable is a PN→GO bridge contract (below): every PN row classified
as GO-actionable, explicitly non-actionable, or queued as an ontology/evidence
problem.

Highlights

PN works best as a prioritization layer: it points AIGR at proteostasis genes
whose existing GO annotations are stale, over-propagated, or mis-attributed. A
representative case is the selective-autophagy receptor CALCOCO2/NDP52, where PN
membership prompted a review that removed the legacy PML-body localization
(revised by later imaging) while keeping the well-supported xenophagy/mitophagy
receptor core. Other findings worth pulling out of the batch logs:

Wrong or over-propagated annotations removed/downgraded

Pseudoenzyme and adaptor-vs-catalyst corrections

Ontology gaps exposed — receptor functions absent from GOA, added as NEW
(QuickGO-verified): GO:0034517 ribophagy, GO:0160247 autophagy cargo adaptor
activity, GO:0035973 aggrephagy, GO:0010508 positive regulation of autophagy.

Conservative rejections of over-broad PN projections — a projection labelled
"more specific than GOA" is not automatically a better assertion: TOMM20 (PN
protein import broader than the existing route-specific term), HSPA8
(aggrephagy rejected in favor of its better-supported CMA biology), and RAB7A
(autophagosome-lysosome fusion rejected in favor of post-fusion maturation). More
in Using PN inside AIGR.

Citation QA caught by review — e.g. PMID:23264731 (a microtubule study)
mis-cited on both SERP1 (removed; wrong gene) and SRPRB (left UNDECIDED);
SIAH1's zinc ion binding citation (PMID:11863358) flagged
WRONG_IDENTIFIER.

Background

Proteostasis is used here in the broad systems sense: the cellular machinery that
supports protein synthesis, folding, trafficking, quality control, sequestration,
and degradation.

The source resource is the Human Proteostasis Network Annotation 4.3.11
workbook (mapping/browser artifacts use the 2026-04-17 release) plus three
Proteostasis Consortium survey manuscripts:

The Consortium workbook defines a curated proteostasis membership and role
taxonomy
; it does not define a GO-ready annotation set. This project asks:
what biological claims do PN rows actually make; which are close to GO terms,
which imply ontology gaps, and which are systems metadata; and which entries are
especially useful, curious, or suspect for AIGR? Within the project the PN
resource provides systems architecture and candidate roles, unfolded protein
binding is treated as one mechanistic subdomain, and AIGR uses PN as a scaffold,
QA source, and prioritization layer.

Review progress

Genes are reviewed in branch-themed batches selected from the projected candidate
additions report. Per-gene metadata is tracked in
review_batches.tsv; the full selection
rationale and notable calls for each batch live in the linked selection notes.

Batch PN branch / theme Genes Selection notes
#1217 (merged 2026-06-02) First PN pass (mixed) 50
2026-06-03 (in progress) Projected candidate additions, batch 2 50
2026-06-06 Candidate additions batch 3 (V-ATPase, ER folding/QC, autophagy receptors) 20 batch3
2026-06-07 Batch 4 (V-ATPase isoforms, mito/ER chaperones, collagen, CRL/UPS adaptors) 30 batch4
2026-06-07b Chaperone & co-chaperone network (DNAJ/HSP40, small HSPs, HSP70/90 co-chaperones, FKBPs, ER PPIases) 50 batch5
2026-06-07c Co-translational QC (RQC & ribosome rescue, UFMylation, NMD, N-terminal acetylation) 51 batch6
2026-06-11 ER proteostasis (SRP/translocon, EMC & GET insertion, glycoprotein QC, ERAD machinery) 50 batch7
2026-06-13 UPS Cullin-RING ligases (44 F-box receptors, CRL4 core, assembly regulators) 50 batch8
2026-06-14 ALP selective-autophagy receptors (SQSTM1/NBR1/OPTN, TBK1 axis, TRIMs, tagging E3s) 20 batch9

Action-mix totals are recorded per batch where computed (counts sum to each
batch's annotation total): ER batch (1347) ran 783 ACCEPT / 476
KEEP_AS_NON_CORE / 75 MARK_AS_OVER_ANNOTATED / 8 REMOVE / 3 MODIFY / 1
NEW / 1 UNDECIDED; UPS batch (1590) ran 649 ACCEPT / 878
KEEP_AS_NON_CORE / 23 MARK_AS_OVER_ANNOTATED / 19 MODIFY / 11 NEW / 8
UNDECIDED / 2 REMOVE; ALP batch (1222) ran 505 ACCEPT / 690
KEEP_AS_NON_CORE / 19 MARK_AS_OVER_ANNOTATED / 5 NEW / 2 UNDECIDED / 1
REMOVE. The dominant pattern across UPS/ALP is elevating a specific
adaptor/receptor MF (substrate-adaptor activity; ubiquitin + LIR/Atg8
binding) over bare protein binding, which is uniformly kept non-core. Mega-hubs
(HSPA5/BiP, HSP90AA1/AB1, HSP90B1/GRP94, HUWE1, OGT, the UPR sensors
ERN1/EIF2AK3) are deferred to dedicated single-gene reviews.

What the PN resource actually contains

The workbook is a row-per-role annotation table, not a gene-centered review file.

The custom hierarchy has:

This is their own vocabulary:

Current Mapping Completion Status

The current curated mapping pass is complete for the Human Proteostasis Network Annotation 4.3.11 workbook release dated 2026-04-17.

Coverage after the completion pass:

Level Total source codes Pending review Mapped Context only No mapping Deferred Missing from YAML
Branch 9 0 0 1 8 0 0
Class 42 0 9 16 17 0 0
Group 297 0 133 31 133 0 0
Type 800 0 233 26 541 0 0
Subtype 881 0 105 16 760 0 0

Every 2026-04-17 PN source code now has exactly one subject_curations
record in a branch mapping YAML. missing_from_yaml is now a QA failure state,
not a normal curator bucket, and should remain zero.

The current hierarchy has 2029 total source nodes and 1348 leaf nodes.
The YAML inventory contains:

Curation status Records Meaning
mapped 480 Reviewed and mapped to a GO term
context_only 90 GO relationship recorded, but unsafe for gene-level propagation
no_mapping 1459 Reviewed and concluded that no GO mapping should be made

Mapping scopes are:

Mapping scope Records Use
exact 3 Direct semantic match
ok_for_propagation_to_go 477 May produce candidate gene-GO propagations
too_broad_to_propagate 90 Real contextual alignment, but excluded from propagation

There are no remaining pending_review, deferred, or missing_from_yaml
records in the current mapping set. Most source codes now resolve to
no_mapping, which is an intentional curation outcome: the PN node is useful
for proteostasis taxonomy but should not become a GO assertion.

Projection against the human GOA DuckDB at
~/repos/go-db/db/goa_human.ddb produced:

Projection status Unique gene-GO pairs
already in GOA exactly 1928
entailed by GOA closure 512
more specific than existing GOA 305
supported by GOA regulation 35
new to GOA 753
no local GOA available 32

Only the 1093 candidate additions (more_specific_than_existing_goa +
supported_by_goa_regulation + new_to_goa) should enter manual AIGR
rereview queues. The no_local_goa class is mostly a data-availability state,
not biological evidence; with the DuckDB source it is now a small residual
category.

Extra-Scrutiny Findings

The mapping audit flags 430/519 GO-bearing curation records as requiring manual
gene-level review before they are used to change a gene review. These are not
necessarily wrong mappings; they are places where propagation can mislead if the
projected GO term is treated as an asserted gene function.

Main flagged patterns:

Representative cases that should stay in the manual review queue:

Evidence Shape By Branch

The current release is uneven by design.

Branch set What is present in the workbook What it means for reuse
ALP Per-row notes for 1003/1003 rows and references for 1001/1003 rows Best-supported branch for row-level reuse and audit
UPS Principal domains for 1528/1528 rows and auxiliary domains for 1521/1528 rows Strong domain/family scaffold, but weaker row-level functional justification in this release
Other 7 branches No row-level notes, no row-level references, no domain columns Useful as curated membership/context, but not ready to import into GO as-is

What Each Level Means

The hierarchy is semantically mixed, which is the main reason it does not map cleanly to GO.

PN level What it usually encodes Relation to GO
Branch Localization or top-level pathway membership BP/CC hybrid, not a GO class
Class Function in proteostasis, except ALP where it is a stage of autophagy MF/BP hybrid
Group System, complex, pathway module, or mechanistic bucket Sometimes GO-like, often not
Type More specific mechanistic role, family, or complex membership May correspond to MF, BP, CC, or family metadata
Subtype Structural family, domain class, or finer mechanistic subdivision Often outside GO scope

Examples by level:

Fully expanded examples:

These are informative for humans, but they are not all GO terms and they are not all the same kind of thing.

Relationship To GO

Explicit GO mapping

There is essentially none in the released workbook.

Implicit GO mapping

There is a lot of implicit overlap with GO, but it is inconsistent in kind.

Some PN rows look close to GO:

But many PN labels are not GO-ready assertions:

So the right reading is:

PN-GO Bridge Contract

The recommended collaboration is not "PN keeps doing PN, GO keeps doing GO, and
we occasionally sync." That is too loose. The better model is a GO-ready
companion layer
for each PN release.

PN should keep its native Branch/Class/Group/Type/Subtype taxonomy because it
captures proteostasis systems information that GO should not flatten. But each
PN row should also carry a bridge decision saying exactly what GO can do with
that row.

Row-level bridge fields

Each PN row should be assigned:

This turns PN into a GO-compatible curation product without forcing PN to become
a GO annotation table.

Ontology-gap triage

The ontology_gap block lives inside the existing mapping row so it does not
duplicate the mapping schema. The parent row still answers: "what existing GO
term, if any, can represent this PN node today?" The nested block answers:
"does this mismatch justify ontology work?"

The block records status values such as covered_by_existing_go,
ntr_candidate, ntr_justified, needs_design_pattern, or
better_as_gocam_or_annotation_extension; a gap_type; candidate parent GO
terms; example genes; anti-scope notes; recommended action; and priority. For
example, the SPNS1/ALR efflux row is marked ntr_candidate, not yet
ntr_justified, because GO:0007041 lysosomal transport is usable today but
loses the autophagic-lysosome-reformation efflux semantics that PN captures.

Deliverables per PN release

For each PN workbook release, GO/AIGR should produce four artifacts:

  1. GO annotation candidate table: only rows with direct_go_annotation or
    go_annotation_after_gene_review.
  2. Exception table: cases such as HSPA12A/HSPA12B under HSP70, TTC28
    under HSP90 cochaperone, BAG6 under broad ER protein transport, and
    pseudoenzymes under enzyme-family buckets.
  3. Ontology-gap list: missing GO concepts exposed by PN, separated from
    ordinary annotation work.
  4. PN feedback table: workbook corrections, ambiguous placements, weak
    domain-only inclusions, and evidence upgrades needed from PN authors.

The sync point should be release-gated, not informal: PN release -> bridge
classification -> projection/audit -> gene-level review of action rows -> GO
annotations, ontology tickets, and PN feedback.

Decision policy

PN row type GO action
Direct complex/component/enzyme/process role with gene-level evidence Curate or project to GO, then validate gene-by-gene if propagation is involved
Family/domain bucket with known divergent members Keep PN-native; add gene exceptions before any GO projection
Regulatory or pathway-stage placement Usually GO-review queue, not automatic annotation
Proteostasis systems context with no clean GO semantics Keep as PN context only
Real biology lacking a GO term Open an ontology ticket; do not force into a broad existing GO box

This is stricter than a periodic mapping sync. It makes every PN row either
actionable for GO, explicitly non-actionable for GO, or queued as an ontology or
evidence problem.

Using PN inside AIGR: triage & QA

The PN resource is useful inside AIGR primarily as a triage and QA layer.

High-value positive controls

These are cases where PN and existing AIGR work largely agree and help validate the framework:

Context-specific but plausible PN rows

These are probably real, but should be treated as non-core or secondary until checked carefully:

Domain- or family-driven caution cases

These are especially useful for AIGR because the papers themselves signal uncertainty or inclusivity:

Existing-review rereview examples

On the 83-gene existing-review queue, the useful distinction was not just the
pipeline label but whether the PN-projected term was actually a better GO
assertion than the current AIGR review. In practice,
more_specific_than_existing_goa is a projection label, not a guarantee that
the PN term remains the most specific biologically defensible choice after
manual rereview.

Priority Follow-up Targets

See priority_genes.tsv.

These rows began as first-pass priorities, but local reviews now exist for all
listed genes. They are not yet represented in review_batches.tsv or the
phase-1 dossier set, so the next task is bookkeeping plus bridge-outcome
integration rather than initial fetch/review.

Recommended follow-up jobs:

  1. Add these reviewed boundary cases to PN review tracking or create a
    separate boundary-review batch so they appear in phase-1-style dossiers.
  2. Materialize their mapping outcomes in the bridge layer:
    HSPA12A/HSPA12B are HSP70-family exceptions; AARSD1 and TTC28 are
    HSP90-cochaperone exceptions; BAG6 supports specific GET/ERAD/holdase
    terms but not broad ER protein-transport propagation; BTF3 is a positive
    NAC-component case.
  3. Re-run the phase-1 dossier builder after updating the tracking sidecar.

PN-vs-UPB Comparison

The UPB project is the best place to reason about:

The PN project is broader:

Next Steps

Browse the data

Mapping sets (one per branch):
ALP ·
Chaperone ·
ER ·
Extracellular ·
Mitochondrial ·
Nuclear ·
PN regulation ·
Translation ·
UPS

QA reports:
mapping scrutiny ·
unusual propagations

Related projects:
Unfolded Protein Binding
(gene list) ·
Ribosome Quality Control ·
Integrated Stress Response ·
ER-phagy