Ortholog Conjecture Project

IN_PROGRESS PIPELINE

Warnings (1)

Ortholog Conjecture Project

Overview

The ortholog conjecture (OC) posits that orthologs (genes separated by speciation) tend to retain more similar functions than paralogs (genes separated by duplication) at the same evolutionary distance. This assumption underpins many automated functional annotations and cross-species inference pipelines. The OC remains debated because different data types, bias controls, and evaluation metrics can yield opposite conclusions.

Slides

GO Context (How Orthology Appears in Annotations)

GO annotations explicitly encode how an inference was made, and orthology appears directly in several mechanisms:

These mechanisms mean the OC is not just theoretical; it affects annotation propagation, evidence reliability, and the risk of over-annotation.

Research Summary (General)

Evidence Challenging the OC

Methodological Cautions

Evidence Supporting the OC

Scope Clarification (Science-First)

This project is about the science of orthology vs functional divergence and how to measure it without circularity or annotation bias. It is not primarily a curation rulebook for reviewers; those guidelines belong at the repo level.

We will still use GO context and evidence codes as inputs, but the emphasis here is on building a reproducible, unbiased analysis of functional conservation/divergence.

Project Goals (Science-Focused)

Initial Workplan

  1. Seed a case list from literature with concrete divergence signals (expression shifts, loss of replaceability, domain architecture changes) and tie each to explicit references.
  2. Build ortholog sets across a few well-studied clades (1:1 and 1:many) and annotate which GO evidence codes dominate (ISO/IEA vs IBA/IBD vs experimental).
  3. Implement open-world metrics: compute functional similarity using only experimental annotations, and separately include ISO/IBA to measure the impact of orthology-based transfer.
  4. Compare GO-based similarity with orthogonal signals (RNA-seq expression similarity, phenotype concordance where available).
  5. Summarize divergence rates and create a “bias checklist” so metrics can be interpreted correctly.

Open-World Metrics Spec (Draft)

Goal: avoid penalizing missing annotations (unknowns) while still capturing functional divergence signals.

Inputs

Evidence Tiers (Configurable)

Similarity Metrics (Open-World)

Bias Controls

Reporting

Functional Divergence: Curated Cases (Evidence-Backed)

These are concrete ortholog divergence examples with gene-level evidence that can seed the "open world" curation set.

CMAH (CMP-Neu5Ac hydroxylase)

Case: Human CMAH is inactivated; nonhuman primates retain functional CMAH.
Type: Loss-of-function ortholog (human lineage-specific).
Evidence: Human CMAH is inactive due to a deletion that removes a 92-bp exon; loss of Neu5Gc synthesis in humans has been directly linked to CMAH inactivation. [PMID:9751737; PMID:11562455]

UOX (urate oxidase / uricase)

Case: UOX is inactivated in hominoids, but functional in many other mammals.
Type: Loss-of-function ortholog with independent inactivation events.
Evidence: Multiple nonsense mutations and splice defects in human and great ape UOX; evidence for independent inactivation in gibbon lineage. [PMID:11961098]

GULO / GULOP (L-gulonolactone oxidase)

Case: Human and other primates lack functional GULO, preventing endogenous vitamin C synthesis.
Type: Loss-of-function ortholog / pseudogenization.
Evidence: Human GULO is a pseudogene with multiple mutations; primate nonfunctionalization documented at the sequence level. [PMID:1962571; PMID:10572964]

CDC14 (Cdc14 phosphatase family)

Case: Budding yeast Cdc14 is essential for mitotic exit, but orthologs in fission yeast and vertebrates are not required for mitotic exit and show different cellular roles.
Type: Functional role shift across orthologs.
Evidence: Review of conserved family with non-conserved functions, including fission yeast Clp1 roles in cytokinesis control and vertebrate CDC14 non-essentiality for mitotic exit. [PMID:20720150]

Arabidopsis - A. lyrata co-orthologs (expressolog study)

Case: Ortholog groups with multiple A. lyrata copies show divergence in expression and functional complementation compared to Arabidopsis.
Type: Neofunctionalization and nonfunctionalization after duplication within ortholog groups.
Evidence: Expressolog analysis across 286 A. lyrata duplicated gene groups and experimental complementation of 8 A. lyrata homologs in 4 Arabidopsis loss-of-function mutants; nonexpressologs fail to complement, indicating functional divergence. [PMID:27303025]

Large-Scale Divergence Datasets (To Mine)

Appendix: Lineage-Specific Loss (Not Used for OC Metrics)

These are boundary cases demonstrating that orthology does not guarantee functional retention. They are excluded from OC comparison metrics focused on subtle functional divergence.

Human loss-of-function orthologs (UOX, GULOP/GULO)

Case: Humans lack functional urate oxidase (UOX) and L-gulonolactone oxidase; the human locus is annotated as the pseudogene GULOP.
Type: Lineage-specific loss-of-function with retained but inactive orthologous loci.
Evidence: UOX inactivation in the human/great ape clade is due to nonsense mutations identified in primate comparative sequencing. [PMID:11961098] Human L-gulono-γ-lactone oxidase exists as a pseudogene with accumulated mutations, and the human GULOP locus is annotated as nonfunctional. [PMID:10572964; NCBI GTR Gene:2989]

Key References


STATUS

Completed

Pending

Last updated: 2026-02-09

NOTES

2026-02-09

Refocused project scope to investigate ortholog divergence science and unbiased metrics; added seed examples and open-world framing.

2026-02-09

Added a curated seed list of functional divergence cases with literature references.