AlphaFold Database Integration for Gene Annotation Review

IN_PROGRESS PIPELINE

AlphaFold Database Integration for Gene Annotation Review

Overview

The AlphaFold Database (AFDB) now includes proteome-scale quaternary structure predictions (protein complexes), not just monomers. This creates opportunities to use predicted structural information as evidence when reviewing GO annotations and ARBA rules.

Reference: Han, Tsenkov, Venanzi et al. "AlphaFold Database expands to proteome-scale quaternary structures" (NVIDIA Digital Biology / EBI)

Use Cases

1. Structural Validation of GO Annotations

When reviewing GO annotations (especially Molecular Function and Cellular Component), AFDB structures can validate or challenge claims:

2. Cross-Species Annotation Transfer Confidence

ARBA rules often transfer annotations across species based on sequence similarity (InterPro signatures). AFDB adds a structural dimension:

3. Protein-Protein Interaction Evidence

The new quaternary structure predictions can:

Worked example — the BGC project (BGC.md): Moriwaki et al. (bioRxiv 2025.10.26.684697)
ran a genome-scale AF3/MMseqs2 complex screen over 2,437 MIBiG BGCs and used predicted
heteromeric interfaces (ipTM ≥ 0.6; ipSAE to disambiguate paralog look-alikes) to assign
function to "uncharacterized" cluster proteins. We use it as one line of evidence in the BGC
exemplar reviews (PqsBC ipTM 0.95 / PDB 5DWZ; act KS-CLF 0.96 / 1TQY; EryCII-CIII 0.92 / 2YJN),
treating high-ipTM hits as hypotheses that corroborated experimentally established complexes
rather than driving the call. Caveat reinforced by their own validation set: real PDB-confirmed
complexes can score low (false negatives), so absence of a high-ipTM prediction is not evidence
against a complex.

4. Disorder / Intrinsically Disordered Region Context

AFDB per-residue confidence scores (pLDDT) mark disordered regions:

5. Flagging Structurally Implausible ARBA Rules

Some ARBA rules assign GO terms that are structurally implausible for the target proteins:

These structural mismatches can serve as independent evidence for recommending rule deprecation or modification.

Practical Integration

API Access

AFDB entries can be fetched per UniProt accession:

# Monomer structure
curl https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}

# Download PDB/mmCIF
curl https://alphafold.ebi.ac.uk/files/AF-{uniprot_id}-F1-model_v4.pdb

# Per-residue confidence (pLDDT) is in the B-factor column of the PDB file

Integration with Review Workflow

For each gene in an ARBA rule review:

  1. Fetch AFDB entry via UniProt accession (already available from the UniProt record in the gene folder)
  2. Extract structural features: fold classification, predicted binding sites, transmembrane regions, disordered regions (pLDDT < 70)
  3. Compare against GO terms being assigned by the rule
  4. Flag mismatches in the review output

Evidence Type

AFDB-derived evidence would be classified as:
- evidence_source: COMPUTATIONAL (predicted structure, not experimental)
- Could reference the AFDB accession as a dataset
- pLDDT confidence scores provide a built-in quality metric

Relationship to Other Projects

Action Items