Human Proteostasis Network Project
Background
Proteostasis is used here in the broad systems sense: the cellular machinery that
supports protein synthesis, folding, trafficking, quality control, sequestration,
and degradation.
The source resource analyzed in this project is the Human Proteostasis Network
Annotation 4.3.11 workbook together with three Proteostasis Consortium survey
manuscripts:
MS1introduces the overall PN framework and covers translation, folding,
transport, and organelle-specific proteostasis systems.MS2covers the autophagy-lysosome pathway (ALP) and provides the most
detailed row-level notes and references in the current release.MS3covers the ubiquitin-proteasome system (UPS) and explains the
domain-heavy inclusion and classification logic used for that branch.
The mapping and browser artifacts now use the 2026-04-17 PN 4.3.11
workbook. Earlier 2024 release artifacts are kept only as provenance for
individual curation comments that were started against that release.
Overview
The Proteostasis Consortium workbook and papers define a curated human proteostasis
membership and role taxonomy. They do not define a GO-ready annotation set.
This project asks three main questions:
- What biological claims do the PN annotations actually make?
- Which PN statements are close to GO terms, which imply ontology gaps, and which
are better treated as systems metadata? - Which workbook entries look especially useful, curious, or suspect in the
context of AIGR?
Within this project:
- the PN resource provides systems architecture and candidate roles
- unfolded protein binding is treated as one mechanistic subdomain within proteostasis
- AIGR uses PN as a scaffold, a QA source, and a prioritization layer
See:
- Priority genes
- PN tree browser
- ALP mapping set
- Chaperone mapping set
- ER proteostasis mapping set
- Extracellular proteostasis mapping set
- Mitochondrial proteostasis mapping set
- Nuclear proteostasis mapping set
- PN regulation mapping set
- Translation mapping set
- UPS mapping set
- Project-local tests
- Project-local reports
- Report-local PN tree browser
- Mapping scrutiny report
- Unusual propagation report
- Mapping export workbook
- UPB gene list
- Unfolded Protein Binding project
- Ribosome Quality Control project
- Integrated Stress Response project
- ER-phagy project
What The PN Resource Actually Contains
The workbook is a row-per-role annotation table, not a gene-centered review file.
3123unique genes4000annotation rows323genes with cross-branch annotations318genes with multiple annotations within one branch2482genes with a single annotation
The custom hierarchy has:
9Branches33Classes277Groups676Types536Subtypes
This is their own vocabulary:
- the workbook contains no GO IDs or explicit GO term mappings
- the papers explicitly describe the PN hierarchy as a taxonomy that complements structured vocabularies such as GO
- the manuscripts say GO, Reactome, KEGG, UniProt, and InterPro were used as inputs for building preliminary lists, not as the target representation
Current Mapping Completion Status
The current curated mapping pass is complete for the Human Proteostasis Network
Annotation 4.3.11 workbook release dated 2026-04-17.
Coverage after the completion pass:
| Level | Total source codes | Pending review | Mapped | Context only | No mapping | Deferred | Missing from YAML |
|---|---|---|---|---|---|---|---|
| Branch | 9 | 8 | 0 | 1 | 0 | 0 | 0 |
| Class | 42 | 30 | 9 | 2 | 1 | 0 | 0 |
| Group | 297 | 229 | 60 | 2 | 1 | 5 | 0 |
| Type | 800 | 661 | 119 | 2 | 14 | 4 | 0 |
| Subtype | 881 | 795 | 64 | 1 | 14 | 7 | 0 |
Every 2026-04-17 PN source code now has exactly one subject_curations
record in a branch mapping YAML. missing_from_yaml is now a QA failure state,
not a normal curator bucket, and should remain zero.
The current hierarchy has 2029 total source nodes and 1348 leaf nodes.
The YAML inventory contains:
| Curation status | Records | Meaning |
|---|---|---|
pending_review |
1723 | Accounted for in YAML, but not yet manually analyzed in depth |
mapped |
252 | Reviewed and mapped to a GO term |
context_only |
8 | GO relationship recorded, but unsafe for gene-level propagation |
no_mapping |
30 | Reviewed and concluded that no GO mapping should be made |
deferred |
16 | Reviewed, but blocked by evidence, taxonomy ambiguity, or a missing/better GO term |
Mapping scopes are:
| Mapping scope | Records | Use |
|---|---|---|
exact |
4 | Direct semantic match |
ok_for_propagation_to_go |
248 | May produce candidate gene-GO propagations |
too_broad_to_propagate |
8 | Real contextual alignment, but excluded from propagation |
Most PN source codes are not yet final non-map calls. They are explicitly
tracked as pending review so curators can distinguish coverage bookkeeping from
completed curation decisions.
Projection against the human GOA DuckDB at
~/repos/go-db/db/goa_human.ddb produced:
| Projection status | Unique gene-GO pairs |
|---|---|
| already in GOA exactly | 703 |
| entailed by GOA closure | 403 |
| more specific than existing GOA | 238 |
| supported by GOA regulation | 77 |
| new to GOA | 760 |
| no local GOA available | 16 |
Only the 1075 candidate additions (more_specific_than_existing_goa +
supported_by_goa_regulation + new_to_goa) should enter manual AIGR
rereview queues. The no_local_goa class is mostly a data-availability state,
not biological evidence; with the DuckDB source it is now a small residual
category.
Extra-Scrutiny Findings
The mapping audit flags 180/260 GO-bearing curation records as requiring manual
gene-level review before they are used to change a gene review. These are not
necessarily wrong mappings; they are places where propagation can mislead if the
projected GO term is treated as an asserted gene function.
Main flagged patterns:
151mappings have regulatory, recruitment, localization, sensing, or other
contextual PN source labels.73mappings use broad or context-losing GO targets such as generic
translation, protein transport, DNA repair, DNA binding, or stress-response
terms.44mappings include domain, family, or subtype metadata in the source label.12mappings are at branch or class level.8mappings are explicitly categorized astoo_broad_to_propagateand are
excluded from propagation reports.
Representative cases that should stay in the manual review queue:
Translation -> GO:0006412 translation: useful as high-level context but too
broad for many PN rows;[EDF1](../../genes/human/EDF1/EDF1-ai-review.html)showed that only the RQC term survived manual
rereview.Mitochondrial proteostasis|Protein transport|Protein import -> GO:0017038 protein import: can be broader than an existing route-specific
mitochondrial import term, as seen for[TOMM20](../../genes/human/TOMM20/TOMM20-ai-review.html).Autophagosome/endosome dockingand related ALP docking labels ->
GO:0061909 autophagosome-lysosome fusion: plausible pathway-stage
propagation, but[RAB7A](../../genes/human/RAB7A/RAB7A-ai-review.html)showed that fusion versus post-fusion maturation must
be checked gene by gene.- HSPA8-like CMA/aggrephagy boundary cases: PN aggregate-handling context does
not automatically justify aggrephagy when direct CMA annotations are better
supported. - UPS ubiquitin/UBL-binding context buckets: many are useful triage labels, but
the UPS branch is intentionally inclusive and domain-heavy, so these should
not be imported into GO without independent evidence.
Evidence Shape By Branch
The current release is uneven by design.
| Branch set | What is present in the workbook | What it means for reuse |
|---|---|---|
| ALP | Per-row notes for 1003/1003 rows and references for 1001/1003 rows |
Best-supported branch for row-level reuse and audit |
| UPS | Principal domains for 1528/1528 rows and auxiliary domains for 1521/1528 rows |
Strong domain/family scaffold, but weaker row-level functional justification in this release |
| Other 7 branches | No row-level notes, no row-level references, no domain columns | Useful as curated membership/context, but not ready to import into GO as-is |
What Each Level Means
The hierarchy is semantically mixed, which is the main reason it does not map cleanly to GO.
| PN level | What it usually encodes | Relation to GO |
|---|---|---|
| Branch | Localization or top-level pathway membership | BP/CC hybrid, not a GO class |
| Class | Function in proteostasis, except ALP where it is a stage of autophagy | MF/BP hybrid |
| Group | System, complex, pathway module, or mechanistic bucket | Sometimes GO-like, often not |
| Type | More specific mechanistic role, family, or complex membership | May correspond to MF, BP, CC, or family metadata |
| Subtype | Structural family, domain class, or finer mechanistic subdivision | Often outside GO scope |
Examples by level:
BranchCytonuclear proteostasisER proteostasisAutophagy-Lysosome PathwayUbiquitin Proteasome SystemClassChaperoneProtein transportAutophagophore initiation and elongationE3 ubiquitin and UBL ligasesGroupHSP70 systemTRAP complex componentClass 3 PI3K complex 1, directCRL familyTypeHSP70 nucleotide exchange factorJ-domain containing HSP70 cochaperoneModulator of class 3 PI3K complex 1 activityBTBSubtypeBAG domain familyKCTD typeWD40- many rows have no subtype
Fully expanded examples:
[BAG1](../../genes/human/BAG1/BAG1-ai-review.html)Cytonuclear proteostasis -> Chaperone -> HSP70 system -> HSP70 nucleotide exchange factor -> BAG domain family[STAT1](../../genes/human/STAT1/STAT1-ai-review.html)Autophagy-Lysosome Pathway -> Autophagy gene expression -> Transcriptional repressor[KCTD11](../../genes/human/KCTD11/KCTD11-ai-review.html)Ubiquitin Proteasome System -> E3 ubiquitin and UBL ligases -> CRL family -> BTB -> KCTD type
These are informative for humans, but they are not all GO terms and they are not all the same kind of thing.
Relationship To GO
Explicit GO mapping
There is essentially none in the released workbook.
- I searched the workbook for
GO:andGene Ontologystrings and found none. - The row schema is PN-native:
Branch/Class/Group/Type/Subtype, plus optional notes, references, and UPS domain fields. - The papers cite GO as a source database used to build candidate lists, especially for ALP, not as a field embedded back into the final annotation table.
Implicit GO mapping
There is a lot of implicit overlap with GO, but it is inconsistent in kind.
Some PN rows look close to GO:
TRAP complex componentClass 3 PI3K complex 1 componentHSP70 nucleotide exchange factorRibosome-associated QC
But many PN labels are not GO-ready assertions:
- family labels like
BAG domain family - domain labels like
WD40orTPR domain containing - branch-local staging labels like
Autophagophore initiation and elongation - inclusive UPS buckets like
RING,BTB,DCAF,MEX3
So the right reading is:
- the PN taxonomy is biologically meaningful
- some rows are close to GO MF/BP/CC concepts
- many rows are module, family, or pathway-context labels rather than GO-annotatable facts
Strategy For Bringing PN Into GO
This section is independent of AIGR workflow.
1. Separate membership from GO assertion
A PN row first tells us:
- this gene belongs in proteostasis
- the authors place it in a particular proteostasis module
It does not automatically tell us:
- the correct GO aspect
- the exact GO term
- whether the role is core vs contextual
- whether the evidence is strong enough for manual GO curation
2. Tag each row by provenance
At minimum, rows should be split into:
entity_based_literature_supporteddomain_based_family_inferenceALP_note_backedUPS_domain_backedcross_branch_context_only
This is the biggest determinant of whether a row is GO-ready, prediction-like, or just useful context.
3. Decompose each PN row into candidate GO semantics
Each row should be classified into one of four buckets:
exact_or_near_GO
Example: complex component, transport process, clear enzymatic role.GO_with_context_loss
Example: a GO term exists but loses the proteostasis-system framing.ontology_gap
Example: the biology is real but current GO lacks a clean term or term family.non_GO_metadata
Example: family/domain/subtype labels that should stay as supporting metadata.
4. Curate by branch, not all at once
The branches are not equally reusable.
- Start with ALP rows that have notes and references.
- Use non-ALP/non-UPS branches to identify likely GO-compatible roles and obvious positive controls.
- Treat UPS as a mixed case: many rows are probably best handled as prediction candidates or review leads until branch-specific evidence is inspected.
5. Keep domain-based rows conservative
The papers explicitly flag domain-based inclusions and borderline cases.
Concrete examples:
[HSPA12A](../../genes/human/HSPA12A/HSPA12A-ai-review.html)and[HSPA12B](../../genes/human/HSPA12B/HSPA12B-ai-review.html)are included as HSP70-family PN components even though MS1 says their proteostasis functions are not yet known.- UPS authors explicitly say they aimed to be inclusive and that domain weight is debatable in some cases.
These should not be imported into GO as manual curation without independent evidence.
6. Expect ontology-gap work
The PN resource reinforces that GO has missing or awkward coverage around:
- holdase chaperoning
- sensor/adaptor roles in quality control
- co-chaperone mechanistic MF space
- proteostasis-system context that mixes process and molecular role
AIGR Triage Opportunities
The PN resource is useful inside AIGR primarily as a triage and QA layer.
High-value positive controls
These are cases where PN and existing AIGR work largely agree and help validate the framework:
[CRYAA](../../genes/human/CRYAA/CRYAA-ai-review.html),[CRYAB](../../genes/human/CRYAB/CRYAB-ai-review.html): PN small-HSP placement agrees with the UPB/holdase interpretation[SSR1](../../genes/human/SSR1/SSR1-ai-review.html),[SSR2](../../genes/human/SSR2/SSR2-ai-review.html): PNTRAP complex componentfits existing ER protein-targeting/translocation GO work[EDF1](../../genes/human/EDF1/EDF1-ai-review.html): PN ribosome-associated QC placement is consistent with the current AIGR review[KCTD11](../../genes/human/KCTD11/KCTD11-ai-review.html): PN CRL/BTB placement matches the current AIGR view of a Cul3 adaptor
Context-specific but plausible PN rows
These are probably real, but should be treated as non-core or secondary until checked carefully:
[STAT1](../../genes/human/STAT1/STAT1-ai-review.html): ALP transcriptional repressor role based onULK1promoter regulation[BIRC6](../../genes/human/BIRC6/BIRC6-ai-review.html): ALP docking/fusion regulator role on top of its core UPS E2/E3 biology[CIAO1](../../genes/human/CIAO1/CIAO1-ai-review.html): UPSDCAFplacement on top of its stronger Fe-S assembly/chaperone identity
Domain- or family-driven caution cases
These are especially useful for AIGR because the papers themselves signal uncertainty or inclusivity:
[HSPA12A](../../genes/human/HSPA12A/HSPA12A-ai-review.html),[HSPA12B](../../genes/human/HSPA12B/HSPA12B-ai-review.html): included as HSP70 PN components despite admitted lack of clear proteostasis function[AARSD1](../../genes/human/AARSD1/AARSD1-ai-review.html): dual placement asHSP90 cochaperoneandtRNA synthetase[BAG6](../../genes/human/BAG6/BAG6-ai-review.html): split across GET-pathway transport and UPS; MS1 explicitly distinguishes it from canonical BAG-domain NEFs[TTC28](../../genes/human/TTC28/TTC28-ai-review.html): PN places it as an HSP70-HSP90 joint cochaperone, but the current AIGR review treats it as a mitotic scaffold[MEX3B](../../genes/human/MEX3B/MEX3B-ai-review.html): RING-family UPS placement may be real but needs distinction between family membership and core proteostasis role
Existing-review rereview examples
On the 83-gene existing-review queue, the useful distinction was not just the
pipeline label but whether the PN-projected term was actually a better GO
assertion than the current AIGR review. In practice,
more_specific_than_existing_goa is a projection label, not a guarantee that
the PN term remains the most specific biologically defensible choice after
manual rereview.
[BCAP31](../../genes/human/BCAP31/BCAP31-ai-review.html)(more_specific_than_existing_goa): acceptedGO:0036503 ERAD pathway.
This is a good positive-control case where the PN mapping is exact and the
biology holds up. The current review already supported direct ERAD
participation, and the rereview added the pathway term because BCAP31 helps
handle retrotranslocation of ERAD substrates rather than only acting in a
looser ER-stress or regulatory context.[EDF1](../../genes/human/EDF1/EDF1-ai-review.html)(new_to_goa): PN suggestedGO:0002181 cytoplasmic translation,
GO:0006412 translation, andGO:0006515 protein quality control for misfolded or incompletely synthesized proteins. OnlyGO:0006515survived
conservative rereview. The broad translation terms were not added because the
best-supported biology is collided-ribosome surveillance and
ribosome-associated quality control rather than generic translation.[TOMM20](../../genes/human/TOMM20/TOMM20-ai-review.html)(more_specific_than_existing_goain the queue): rejected. The PN
mitochondrial mapping propagates the group-levelProtein importbucket to
GO:0017038 protein import, but the current AIGR review already uses the
route-specific mitochondrial import termGO:0030150 protein import into mitochondrial matrix. At the gene-review level, the PN suggestion was
broader rather than more specific.[HSPA8](../../genes/human/HSPA8/HSPA8-ai-review.html)(more_specific_than_existing_goain the queue): rejected. The PN
GO:0035973 aggrephagyprojection comes from a selective-autophagy-receptor
path, whereas the current review already captures HSPA8's direct and much
better supported CMA biology withGO:0061684 chaperone-mediated autophagy
andGO:0061740 protein targeting to lysosome involved in chaperone-mediated autophagy. HSPA8 clearly participates in proteostasis and aggregate handling,
but that did not justify promoting it to aggrephagy here.[RAB7A](../../genes/human/RAB7A/RAB7A-ai-review.html)(more_specific_than_existing_goain the queue): rejected on
conservative rereview. PN projectedGO:0061909 autophagosome-lysosome fusion, but the local evidence base is mixed, and mammalian knockout work
supports a stronger role in post-fusion autolysosome maturation than in the
fusion step itself. That was not strong enough to add the more specific term
to the human review.
Priority Review Targets
See priority_genes.tsv.
Recommended first-pass jobs:
[BTF3](../../genes/human/BTF3/BTF3-ai-review.html): fetch and review the true human[BTF3](../../genes/human/BTF3/BTF3-ai-review.html)gene (P20290) as the PN
nascent-polypeptide-associated complex component.[HSPA12A](../../genes/human/HSPA12A/HSPA12A-ai-review.html)and[HSPA12B](../../genes/human/HSPA12B/HSPA12B-ai-review.html): fetch and review as explicit domain-based PN inclusions.[AARSD1](../../genes/human/AARSD1/AARSD1-ai-review.html): review the dual chaperone/translation placement.[BAG6](../../genes/human/BAG6/BAG6-ai-review.html): review as a multi-branch boundary case connecting transport, UBL biology, and proteostasis.[TTC28](../../genes/human/TTC28/TTC28-ai-review.html): explicitly test the PN cochaperone claim against the existing mitosis-focused review.
PN-vs-UPB Comparison
- all
33/33human UPB genes are present in the PN workbook - most of those genes are consistent with PN placement at a coarse level
- the best bridge cases are the small HSPs, HSP70/J-domain systems, and the QC boundary genes
The UPB project is the best place to reason about:
- direct unfolded-protein binding
- holdase vs foldase vs co-chaperone distinctions
- GO MF correction and ontology-gap pressure
The PN project is broader:
- define the proteostasis universe
- understand the authors' own annotation model
- identify GO-compatible vs non-GO-compatible PN statements
- use PN to drive AIGR QA and job selection
Next Steps
- Audit the highest-priority genes in priority_genes.tsv.
- Work through the
1075projected candidate additions, using the unusual
propagation report as a blocklist for automatic review edits. - Promote only gene-level decisions that survive evidence review into AIGR
YAML; leave broad PN context as project metadata.