BIOINFORMATICS Case Studies

SCOPING PIPELINE

BIOINFORMATICS Case Studies

Purpose

This document records concrete cases where custom bioinformatics analysis was used to resolve annotation uncertainty in ai-gene-review.

Each case links:
- biological question
- reproducible analysis workflow
- key outputs
- impact on curation decisions

Hypothesis-Linked OpenScientist Workflow

Gene function assignments can be treated as focused hypotheses before they are
accepted, removed, or converted into suggested experiments. The current CLI path
is:

just gene-hypothesis-list <organism> <gene>
just gene-hypothesis-research <provider> <organism> <gene> --annotation-term-id <GO_ID> --as-function-hypothesis --dry-run

Hypothesis reports are written under:

genes/<organism>/<gene>/<gene>-hypotheses/<hypothesis-slug>/<provider>.md

When the source review cites a local bioinformatics result such as
file:<organism>/<gene>/<gene>-bioinformatics/RESULTS.md, the hypothesis prompt
withholds that local analysis from OpenScientist. This keeps the run blinded so
OpenScientist can independently explore the hypothesis that gene G has
function F. After the run, compare the OpenScientist report against the local
RESULTS.md to see whether the independent search converged with, refined, or
contradicted the older analysis.

Use this distinction:

Case Studies

Case ID Gene Species Question Outcome
BIO-001 pmp20 SCHPO (S. pombe) Does pmp20 have thioredoxin-dependent peroxidase activity? Evidence supports loss of canonical thioredoxin-dependent peroxidase function
BIO-002 Epe1 SCHPO (S. pombe) Does the JmjC domain support histone demethylase/oxidoreductase activity? Bioinformatics and literature support treating Epe1 as a non-catalytic chromatin regulator

BIO-001: SCHPO/pmp20 (thioredoxin-dependent peroxidase activity)

Context

pmp20 is in the peroxiredoxin family, but automated/domain-based annotations suggested peroxidase activity while literature and curated UniProt comments indicated loss of thioredoxin-dependent activity.

Curation question

Should GO molecular function terms related to peroxidase/thioredoxin peroxidase activity be retained or removed for SCHPO/pmp20?

Analysis workflow

Methods implemented:
1. Collect target + active controls from local UniProt TXT and UniProt REST.
2. Compare cysteine topology and catalytic-site correspondence at sequence level.
3. Map target residues against active-control resolving-cysteine positions by global alignment.
4. Fetch AlphaFold monomer models and compute CYS SG pair geometry summaries.
5. Validate that the same scripts run on an alternate target (tpx1).

Key outputs

Main findings

Curation impact

The bioinformatics case study was incorporated into the main review as a file: reference:

Updated review file:

Validation status:

Caveats

BIO-002: SCHPO/Epe1 (JmjC demethylase activity)

Context

Epe1 contains a JmjC-family domain, which can trigger automated histone
demethylase, dioxygenase, oxidoreductase, and metal-binding annotations. The
review instead treats Epe1 as a non-catalytic anti-silencing factor that acts in
heterochromatin regulation through Swi6/HP1-associated chromatin mechanisms.

Curation question

Should JmjC-derived GO molecular-function annotations such as histone demethylase
activity be removed for SCHPO/Epe1?

Existing workflow

Hypothesis search candidates can be listed with:

just gene-hypothesis-list SCHPO Epe1

The histone-demethylase function assignment hypothesis can be staged with:

just gene-hypothesis-research openscientist SCHPO Epe1 \
  --annotation-term-id GO:0032452 \
  --as-function-hypothesis \
  --dry-run \
  -- --param use_hypotheses=true

Expected output path for a real run:

genes/SCHPO/Epe1/Epe1-hypotheses/function-hypothesis-go-0032452/openscientist.md

Current findings

Caveats

Template For Future Cases

For each new case, add:
1. Case ID, gene, species, and curation question.
2. Workflow path and exact run commands.
3. Key result files (especially RESULTS.md).
4. Clear statement of how outcomes changed curation decisions.
5. Validation status after integrating results into *-ai-review.yaml.
6. If applicable, the gene-hypothesis-research command and output path for the
OpenScientist hypothesis report.