Human Proteostasis Network

Species: human

Human Proteostasis Network Project

Background

Proteostasis is used here in the broad systems sense: the cellular machinery that
supports protein synthesis, folding, trafficking, quality control, sequestration,
and degradation.

The source resource analyzed in this project is the Human Proteostasis Network
Annotation 4.3.11
workbook together with three Proteostasis Consortium survey
manuscripts:

The mapping and browser artifacts now use the 2026-04-17 PN 4.3.11
workbook. Earlier 2024 release artifacts are kept only as provenance for
individual curation comments that were started against that release.

Overview

The Proteostasis Consortium workbook and papers define a curated human proteostasis
membership and role taxonomy. They do not define a GO-ready annotation set.

This project asks three main questions:

Within this project:

See:

What The PN Resource Actually Contains

The workbook is a row-per-role annotation table, not a gene-centered review file.

The custom hierarchy has:

This is their own vocabulary:

Current Mapping Completion Status

The current curated mapping pass is complete for the Human Proteostasis Network Annotation 4.3.11 workbook release dated 2026-04-17.

Coverage after the completion pass:

Level Total source codes Pending review Mapped Context only No mapping Deferred Missing from YAML
Branch 9 8 0 1 0 0 0
Class 42 30 9 2 1 0 0
Group 297 229 60 2 1 5 0
Type 800 661 119 2 14 4 0
Subtype 881 795 64 1 14 7 0

Every 2026-04-17 PN source code now has exactly one subject_curations
record in a branch mapping YAML. missing_from_yaml is now a QA failure state,
not a normal curator bucket, and should remain zero.

The current hierarchy has 2029 total source nodes and 1348 leaf nodes.
The YAML inventory contains:

Curation status Records Meaning
pending_review 1723 Accounted for in YAML, but not yet manually analyzed in depth
mapped 252 Reviewed and mapped to a GO term
context_only 8 GO relationship recorded, but unsafe for gene-level propagation
no_mapping 30 Reviewed and concluded that no GO mapping should be made
deferred 16 Reviewed, but blocked by evidence, taxonomy ambiguity, or a missing/better GO term

Mapping scopes are:

Mapping scope Records Use
exact 4 Direct semantic match
ok_for_propagation_to_go 248 May produce candidate gene-GO propagations
too_broad_to_propagate 8 Real contextual alignment, but excluded from propagation

Most PN source codes are not yet final non-map calls. They are explicitly
tracked as pending review so curators can distinguish coverage bookkeeping from
completed curation decisions.

Projection against the human GOA DuckDB at
~/repos/go-db/db/goa_human.ddb produced:

Projection status Unique gene-GO pairs
already in GOA exactly 703
entailed by GOA closure 403
more specific than existing GOA 238
supported by GOA regulation 77
new to GOA 760
no local GOA available 16

Only the 1075 candidate additions (more_specific_than_existing_goa +
supported_by_goa_regulation + new_to_goa) should enter manual AIGR
rereview queues. The no_local_goa class is mostly a data-availability state,
not biological evidence; with the DuckDB source it is now a small residual
category.

Extra-Scrutiny Findings

The mapping audit flags 180/260 GO-bearing curation records as requiring manual
gene-level review before they are used to change a gene review. These are not
necessarily wrong mappings; they are places where propagation can mislead if the
projected GO term is treated as an asserted gene function.

Main flagged patterns:

Representative cases that should stay in the manual review queue:

Evidence Shape By Branch

The current release is uneven by design.

Branch set What is present in the workbook What it means for reuse
ALP Per-row notes for 1003/1003 rows and references for 1001/1003 rows Best-supported branch for row-level reuse and audit
UPS Principal domains for 1528/1528 rows and auxiliary domains for 1521/1528 rows Strong domain/family scaffold, but weaker row-level functional justification in this release
Other 7 branches No row-level notes, no row-level references, no domain columns Useful as curated membership/context, but not ready to import into GO as-is

What Each Level Means

The hierarchy is semantically mixed, which is the main reason it does not map cleanly to GO.

PN level What it usually encodes Relation to GO
Branch Localization or top-level pathway membership BP/CC hybrid, not a GO class
Class Function in proteostasis, except ALP where it is a stage of autophagy MF/BP hybrid
Group System, complex, pathway module, or mechanistic bucket Sometimes GO-like, often not
Type More specific mechanistic role, family, or complex membership May correspond to MF, BP, CC, or family metadata
Subtype Structural family, domain class, or finer mechanistic subdivision Often outside GO scope

Examples by level:

Fully expanded examples:

These are informative for humans, but they are not all GO terms and they are not all the same kind of thing.

Relationship To GO

Explicit GO mapping

There is essentially none in the released workbook.

Implicit GO mapping

There is a lot of implicit overlap with GO, but it is inconsistent in kind.

Some PN rows look close to GO:

But many PN labels are not GO-ready assertions:

So the right reading is:

Strategy For Bringing PN Into GO

This section is independent of AIGR workflow.

1. Separate membership from GO assertion

A PN row first tells us:

It does not automatically tell us:

2. Tag each row by provenance

At minimum, rows should be split into:

This is the biggest determinant of whether a row is GO-ready, prediction-like, or just useful context.

3. Decompose each PN row into candidate GO semantics

Each row should be classified into one of four buckets:

  1. exact_or_near_GO
    Example: complex component, transport process, clear enzymatic role.
  2. GO_with_context_loss
    Example: a GO term exists but loses the proteostasis-system framing.
  3. ontology_gap
    Example: the biology is real but current GO lacks a clean term or term family.
  4. non_GO_metadata
    Example: family/domain/subtype labels that should stay as supporting metadata.

4. Curate by branch, not all at once

The branches are not equally reusable.

5. Keep domain-based rows conservative

The papers explicitly flag domain-based inclusions and borderline cases.

Concrete examples:

These should not be imported into GO as manual curation without independent evidence.

6. Expect ontology-gap work

The PN resource reinforces that GO has missing or awkward coverage around:

AIGR Triage Opportunities

The PN resource is useful inside AIGR primarily as a triage and QA layer.

High-value positive controls

These are cases where PN and existing AIGR work largely agree and help validate the framework:

Context-specific but plausible PN rows

These are probably real, but should be treated as non-core or secondary until checked carefully:

Domain- or family-driven caution cases

These are especially useful for AIGR because the papers themselves signal uncertainty or inclusivity:

Existing-review rereview examples

On the 83-gene existing-review queue, the useful distinction was not just the
pipeline label but whether the PN-projected term was actually a better GO
assertion than the current AIGR review. In practice,
more_specific_than_existing_goa is a projection label, not a guarantee that
the PN term remains the most specific biologically defensible choice after
manual rereview.

Priority Review Targets

See priority_genes.tsv.

Recommended first-pass jobs:

  1. [BTF3](../../genes/human/BTF3/BTF3-ai-review.html): fetch and review the true human [BTF3](../../genes/human/BTF3/BTF3-ai-review.html) gene (P20290) as the PN
    nascent-polypeptide-associated complex component.
  2. [HSPA12A](../../genes/human/HSPA12A/HSPA12A-ai-review.html) and [HSPA12B](../../genes/human/HSPA12B/HSPA12B-ai-review.html): fetch and review as explicit domain-based PN inclusions.
  3. [AARSD1](../../genes/human/AARSD1/AARSD1-ai-review.html): review the dual chaperone/translation placement.
  4. [BAG6](../../genes/human/BAG6/BAG6-ai-review.html): review as a multi-branch boundary case connecting transport, UBL biology, and proteostasis.
  5. [TTC28](../../genes/human/TTC28/TTC28-ai-review.html): explicitly test the PN cochaperone claim against the existing mitosis-focused review.

PN-vs-UPB Comparison

The UPB project is the best place to reason about:

The PN project is broader:

Next Steps