SPKW Viral Clades Subproject
Parent project: SPKW.md
Overview
This subproject extends the Swiss-Prot Keyword (SPKW) review from single-organism and term-focused analyses to viruses across multiple viral clades. The goal is to separate three different phenomena that are conflated in raw SPKW annotations:
- Legitimate viral biology - terms such as virion component, viral DNA genome replication, viral envelope, host-cell entry, or latency-replication decision can be appropriate for the right viral protein.
- Taxon-dependent semantic mismatch - phage proteins should not be annotated with eukaryotic innate immune response terms when the host is a bacterium.
- Specificity loss - many SPKW-derived annotations are directionally correct but too broad, especially generic enzymatic functions, generic membrane/virion components, and broad host-defense suppression terms.
The bacteriophage T4 subproject, SPKW-BPT4.md, is the first focused viral case study. This page is the virus-wide equivalent: it summarizes all viral SPKW candidates, folds in additional viral gene reviews, and defines clade-specific follow-up batches.
Summary Statistics
Database: ~/repos/go-db/db/virus.ddb
Queried: 2026-05-21
| Metric | Count |
|---|---|
| Total virus GO annotations | 419,561 |
| Proteins with GO annotations | 75,488 |
| Viral taxa represented | 6,679 |
| SPKW annotations (GO_REF:0000043) | 180,680 |
| Proteins with SPKW annotations | 57,961 |
| Viral taxa with SPKW annotations | 6,629 |
| Naive SPKW-unique annotations | 135,117 |
| Proteins with naive SPKW-unique annotations | 54,131 |
| Closure-filtered SPKW-unique annotations | 80,218 |
| Proteins with closure-filtered SPKW-unique annotations | 42,507 |
| Closure filtering reduction | 41% |
Distribution by Aspect
Naive SPKW-unique annotations:
| Aspect | Annotations | Proteins | Terms |
|---|---|---|---|
| F (Molecular Function) | 79,511 | 39,050 | 73 |
| P (Biological Process) | 34,930 | 20,799 | 192 |
| C (Cellular Component) | 20,676 | 14,876 | 32 |
Closure-filtered SPKW-unique annotations:
| Aspect | Annotations | Proteins | Terms |
|---|---|---|---|
| F (Molecular Function) | 39,239 | 25,050 | 68 |
| P (Biological Process) | 26,809 | 16,873 | 183 |
| C (Cellular Component) | 14,170 | 10,717 | 31 |
Swiss-Prot vs TrEMBL
| Status | SPKW annotations | Proteins |
|---|---|---|
| TrEMBL | 156,933 | 52,452 |
| Swiss-Prot | 23,747 | 5,509 |
Only 13.1% of virus SPKW annotations are on Swiss-Prot entries. This differs from human and T4, where the reviewed Swiss-Prot fraction is very high. For viruses as a whole, errors may come from either the KW->GO mapping layer or automatic keyword assignment on TrEMBL entries.
ViralZone-Primary Keyword Stratification
ViralZone pages are integrated with UniProt keywords, but the safest machine-readable mapping is the page-header data-about="kw:KW-####" field, not the visible "Links / Uniprot Keyword" box. A ViralZone sitemap audit on 2026-05-21 found 147 pages with exactly one primary UniProt keyword; none of those primary keywords were reused as the primary keyword on another ViralZone page. The links box disagreed with the primary keyword on 4 pages, so this report uses the primary keyword as the ViralZone-backed mapping.
In virus.ddb, the SPKW rows carry the exact keyword in with_or_from, such as UniProtKB-KW:KW-1113. This lets us split SPKW annotations into ViralZone-primary KW annotations versus all other UniProt keyword annotations.
| SPKW set | ViralZone-primary KW annotations | Other KW annotations | ViralZone-primary share | Swiss-Prot share within ViralZone-primary | Swiss-Prot share within other KW |
|---|---|---|---|---|---|
| All SPKW annotations | 23,016 | 157,664 | 12.7% | 20.1% | 12.1% |
| Naive SPKW-unique annotations | 18,231 | 116,886 | 13.5% | 19.9% | 12.6% |
| Closure-filtered SPKW-unique annotations | 15,435 | 64,783 | 19.2% | 20.8% | 12.7% |
For ViralZone-primary keywords with UniProt GO cross-references, the observed virus.ddb SPKW rows matched the UniProt KW->GO mapping exactly: 0 mismatched GO IDs across ViralZone-primary SPKW rows. This supports treating the ViralZone-primary keyword set as a curated vocabulary layer rather than an accidental annotation source.
The important curation consequence is a different prior, not automatic acceptance. ViralZone-primary keyword rows are much less dominated by the broad high-risk terms driving the virus-wide SPKW burden. The broad terms most often flagged in this report are all from non-ViralZone-primary keywords:
| High-risk term family | ViralZone-primary rows in naive SPKW-unique set | Interpretation |
|---|---|---|
| GO:0044423 virion component | 0 | Broad component keyword, not a ViralZone-primary process term |
| GO:0016787 hydrolase activity; GO:0016740 transferase activity; GO:0016779 nucleotidyltransferase activity | 0 | Generic enzyme-class keyword layer |
| GO:0000166 nucleotide binding; GO:0046872 metal ion binding; GO:0003677 DNA binding | 0 | Generic binding keyword layer |
| GO:0031640 killing of cells of another organism; GO:0042742 defense response to bacterium | 0 | Non-VZ terms; high-risk phage perspective errors |
| GO:0006260 DNA replication | 0 | Non-VZ generic process; often should be viral genome replication |
| GO:0052170 symbiont-mediated suppression of host innate immune response | 0 | Non-VZ broad parent; ViralZone-primary rows are usually pathway-specific children such as type I interferon signaling suppression |
Top ViralZone-primary naive SPKW-unique terms show the contrast:
| ViralZone page | Keyword | GO term | Naive SPKW-unique annotations | Preliminary interpretation |
|---|---|---|---|---|
| VZ-895 Helicase | KW-0347 | GO:0004386 helicase activity | 3,792 | Often legitimate but still broad for multi-domain/polyprotein records |
| VZ-956 Viral attachment to host cell | KW-1161 | GO:0019062 virion attachment to host cell | 1,472 | Higher confidence on true attachment proteins; check protein assignment |
| VZ-3958 Viral tail protein | KW-1227 | GO:0098015 virus tail | 1,188 | Good phage structural specificity |
| VZ-3955 Viral tail assembly | KW-1245 | GO:0098003 viral tail assembly | 1,075 | Good phage assembly specificity |
| VZ-897 DNA-directed RNA polymerase | KW-0240 | GO:0000428 DNA-directed RNA polymerase complex | 920 | Usually plausible for polymerase-complex subunits; review subunit granularity |
| VZ-980 Viral genome integration | KW-1179 | GO:0044826 viral genome integration into host DNA; GO:0075713 establishment of integrated proviral latency | 821 each | Integration is often valid, but the latency fanout needs review for bacteriophages and episomal viruses |
| VZ-989 Viral penetration into host nucleus | KW-1163 | GO:0075732 viral penetration into host nucleus | 763 | Higher confidence in nuclear-replicating eukaryotic viruses; check clade biology |
| VZ-883 Inhibition of host interferon signaling pathway by virus | KW-1114 | GO:0039502 type I interferon signaling suppression | 254 | Better than broad innate immune suppression; still requires protein/mechanism review |
| VZ-3966 Restriction-modification system evasion by virus | KW-1258 | GO:0099018 symbiont-mediated evasion of host restriction-modification system | 125 | Good phage-specific replacement target for antirestriction proteins |
Working rule: mark ViralZone-primary KW rows as VZ-primary in review notes. This should raise the prior that the term is a deliberate viral biology term, especially for entry, tail, genome-ejection, immune-pathway-specific, and restriction-modification terms. It should not override gene-level review: polyprotein granularity, wrong host context, TrEMBL keyword assignment, and multi-GO fanout can still produce over-annotations.
Top Naive SPKW-Unique Terms
| Term | Label | Annotations | Proteins | Taxa | Initial interpretation |
|---|---|---|---|---|---|
| GO:0044423 | virion component | 11,664 | 11,664 | 4,137 | Often true but too broad; prefer capsid, envelope, tail, virion membrane, or nucleocapsid when known |
| GO:0016787 | hydrolase activity | 11,084 | 11,084 | 4,632 | Broad MF; usually non-core when a specific hydrolase term exists |
| GO:0016740 | transferase activity | 10,128 | 10,128 | 5,253 | Broad MF; often modify/remove in favor of specific transferase activity |
| GO:0000166 | nucleotide binding | 7,595 | 7,595 | 4,713 | Broad MF; usually keep non-core or modify to ATP/GTP/nucleic-acid-specific terms |
| GO:0046872 | metal ion binding | 7,017 | 7,017 | 3,399 | High-risk generic binding annotation |
| GO:0016779 | nucleotidyltransferase activity | 5,553 | 5,553 | 4,302 | Often too broad for polymerases |
| GO:0004519 | endonuclease activity | 4,256 | 4,256 | 2,395 | Often plausible, but specificity matters |
| GO:0046718 | symbiont entry into host cell | 4,212 | 4,212 | 2,662 | Often legitimate for entry proteins, but needs protein-level mechanism |
| GO:0004386 | helicase activity | 3,792 | 3,792 | 2,484 | Likely legitimate for viral helicases |
| GO:0003677 | DNA binding | 3,192 | 3,192 | 1,742 | Usually too broad for DNA enzymes and transcription regulators |
| GO:0031640 | killing of cells of another organism | 1,845 | 1,845 | 1,357 | High-risk for phage lysis proteins; consider viral release by cytolysis |
| GO:0006260 | DNA replication | 1,788 | 1,788 | 1,003 | Often too broad; use viral DNA genome replication where appropriate |
| GO:0052170 | symbiont-mediated suppression of host innate immune response | 1,553 | 1,553 | 973 | Clade-dependent: often wrong for phage, plausible but broad for eukaryotic viruses |
| GO:0042742 | defense response to bacterium | 1,489 | 1,489 | 1,284 | High-risk semantic inversion for bacteriophage proteins |
| GO:0019062 | virion attachment to host cell | 1,472 | 1,472 | 1,367 | Often legitimate but should be restricted to attachment proteins |
Top Viral Taxa by Naive SPKW-Unique Burden
Taxon labels and higher-level clades were resolved from NCBI Taxonomy.
| Taxon | Clade | Family | Annotations | Proteins | Terms |
|---|---|---|---|---|---|
| Yasminevirus sp. GU-2018 | Varidnaviria / Megaviricetes | Mimiviridae | 821 | 426 | 79 |
| Fadolivirus FV1/VV64 | Varidnaviria / Megaviricetes | Mimiviridae | 710 | 403 | 68 |
| Acanthamoeba polyphaga mimivirus | Varidnaviria / Megaviricetes | Mimiviridae | 556 | 259 | 61 |
| Tupanvirus soda lake | Varidnaviria / Megaviricetes | Mimiviridae | 552 | 282 | 71 |
| Bodo saltans virus | Varidnaviria / Megaviricetes | Mimiviridae | 492 | 241 | 58 |
| Megavirus chiliensis | Varidnaviria / Megaviricetes | Mimiviridae | 393 | 223 | 49 |
| Tequatrovirus T4 | Duplodnaviria / Caudoviricetes | Straboviridae | 353 | 124 | 66 |
| Vaccinia virus WR | Varidnaviria / Pokkesviricetes | Poxviridae | 342 | 125 | 65 |
| Vaccinia virus Copenhagen | Varidnaviria / Pokkesviricetes | Poxviridae | 337 | 120 | 65 |
| Acanthamoeba polyphaga moumouvirus | Varidnaviria / Megaviricetes | Mimiviridae | 336 | 181 | 57 |
| Monkeypox virus | Varidnaviria / Pokkesviricetes | Poxviridae | 325 | 117 | 64 |
| Variola virus human/India/Ind3/1967 | Varidnaviria / Pokkesviricetes | Poxviridae | 322 | 110 | 59 |
| Moumouvirus goulette | Varidnaviria / Megaviricetes | Mimiviridae | 308 | 164 | 52 |
| Cafeteria roenbergensis virus BV-PW1 | Varidnaviria / Megaviricetes | Mimiviridae | 292 | 129 | 53 |
| Fowlpox virus strain NVSL | Varidnaviria / Pokkesviricetes | Poxviridae | 253 | 102 | 53 |
| African swine fever virus BA71V | Varidnaviria / Pokkesviricetes | Asfarviridae | 247 | 94 | 63 |
| Orpheovirus IHUMI-LCC2 | Varidnaviria / Megaviricetes | Orpheoviridae | 237 | 132 | 41 |
| Yersinia phage phiYeO3-12 | Duplodnaviria / Caudoviricetes | Autotranscriptaviridae | 233 | 59 | 28 |
| Camelpox virus CMS | Varidnaviria / Pokkesviricetes | Poxviridae | 230 | 94 | 58 |
| Tetraselmis virus 1 | Varidnaviria / Megaviricetes | Allomimiviridae | 229 | 105 | 50 |
Clade-Specific Patterns
Bacteriophages / Caudoviricetes
Status: multiple reviews completed, including SPKW-BPT4.md, Tequatrovirus genes, anti-CRISPR proteins, phage quorum-sensing, and diversity-generating retroelement proteins.
Main pattern: GO immune and defense terms often encode a eukaryotic-host frame that does not apply to bacterial hosts. Bacterial hosts have restriction-modification, CRISPR-Cas, abortive infection, and related anti-phage systems, not "innate immune response" in the GO immune-system sense.
Representative decisions:
| Gene | Taxon | SPKW term | Action | Finding |
|---|---|---|---|---|
| DAM | Enterobacteria phage T4 | symbiont-mediated suppression of host innate immune response | REMOVE | Phage methyltransferase evades restriction-modification, not innate immunity |
| E | Enterobacteria phage T4 | defense response to bacterium | REMOVE | Phage lysozyme lyses host for viral release; not defense |
| AcrF8 | Pectobacterium phage ZF40 | symbiont-mediated suppression of host innate immune response | MODIFY | Anti-CRISPR activity should use CRISPR-Cas-specific suppression |
| darB | Tequatrovirus | methyltransferase activity | MODIFY | Use DNA-methyltransferase activity; biological role is antirestriction |
| g022 | Tequatrovirus | transferase/nucleotidyltransferase activity | REMOVE | Too broad for a DNA polymerase with viral replication role |
Recommended replacements where supported:
| Problem term | Better viral/phage term |
|---|---|
| GO:0052170 symbiont-mediated suppression of host innate immune response | GO:0099018 symbiont-mediated evasion of host restriction-modification system, or GO:0098672 symbiont-mediated suppression of host CRISPR-cas system |
| GO:0042742 defense response to bacterium | GO:0044659 viral release from host cell by cytolysis, when reviewing phage lysis proteins |
| GO:0006260 DNA replication | GO:0039693 viral DNA genome replication, if the protein directly participates in viral genome replication |
Poxviruses / Pokkesviricetes
Status: not yet reviewed as full gene-review batches in this project, but high-volume candidates are present.
Poxviruses are a different case from phages. "Suppression of host innate immune response" may be biologically real for many poxviral immunomodulators, but the broad parent term can hide whether the protein blocks interferon signaling, NF-kappaB signaling, apoptosis, chemokines, complement, antigen presentation, or host-cell entry.
High-priority terms:
| Term | Virus-wide candidates | Interpretation |
|---|---|---|
| GO:0052170 symbiont-mediated suppression of host innate immune response | 1,553 | Plausible in poxviruses, but should be made pathway-specific when mechanism is known |
| GO:0046718 symbiont entry into host cell | 4,212 | Needs restriction to entry/fusion/attachment proteins |
| GO:0044423 virion component | 11,664 | Usually true but too broad |
Priority batch: vaccinia and monkeypox immune evasion proteins. Do not blanket-remove the immune term; review mechanism and replace with specific host-pathway child terms when available.
Herpesviruses / Herviviricetes
Status: not yet reviewed as a focused SPKW batch.
Herpesviruses have real latency, host immune evasion, envelope, entry, and nuclear trafficking biology. The main expected error is not semantic impossibility, but over-generalization and missing specificity.
Priority patterns:
| Candidate term | Expected review action |
|---|---|
| GO:0052170 symbiont-mediated suppression of host innate immune response | KEEP/MODIFY depending on direct mechanism |
| GO:0075713 establishment of integrated proviral latency | Check carefully; herpesvirus latency is usually episomal, not integrated |
| GO:0044826 viral genome integration into host DNA | High-risk outside integrating viruses |
| GO:0019031 viral envelope | Usually acceptable but may be non-core for non-envelope proteins |
Giant dsDNA Viruses / Megaviricetes
Status: high candidate burden; no focused gene-review batch yet.
Mimiviridae and related giant-virus taxa dominate the top taxa table. Their large genomes produce many broad SPKW-derived enzymatic and structural annotations. Many will be directionally correct, but several are probably too general to retain as core function.
Expected patterns:
| Pattern | Example terms | Expected action |
|---|---|---|
| Broad enzyme class instead of specific MF | hydrolase activity, transferase activity, nucleotidyltransferase activity | MODIFY or KEEP_AS_NON_CORE |
| Broad component | virion component | MODIFY to capsid/envelope/virion membrane/tail-like component where supported |
| Generic DNA process | DNA replication, methylation | MODIFY to viral genome replication, DNA methylation, or antirestriction only when directly supported |
Priority batch: one Mimiviridae representative with high SPKW burden, preferably Acanthamoeba polyphaga mimivirus or Tupanvirus soda lake.
RNA Viruses
Status: small-proteome viruses are underrepresented in the top virus-wide burden table but have highly specific host-interaction annotations.
The local ZIKV.ddb contains 85 annotations on 3 proteins, including 36 SPKW annotations on 1 protein. Naive SPKW-unique terms include viral entry, viral envelope, host JAK-STAT suppression, host type I interferon suppression, host autophagy modulation, and capping/processing terms.
Key issue: many RNA virus UniProt records are polyproteins. A single polyprotein entry may carry terms that apply to different mature cleavage products. Reviews should be careful not to treat every SPKW-derived term as a function of every mature protein product.
Priority batch: Zika virus polyprotein or mature NS proteins, with special attention to mature-product granularity.
Reviewed Viral SPKW Annotations
Existing viral gene reviews in this repo include 11 genes and 31 SPKW-derived annotations:
| Action | Count |
|---|---|
| ACCEPT | 13 |
| KEEP_AS_NON_CORE | 1 |
| MARK_AS_OVER_ANNOTATED | 1 |
| MODIFY | 6 |
| REMOVE | 10 |
If MODIFY, REMOVE, and MARK_AS_OVER_ANNOTATED are counted as annotations requiring curation changes, the current viral pilot issue rate is 17/31 (55%).
Reviewed Genes
| Gene | Taxon | File | Main SPKW pattern |
|---|---|---|---|
| DAM | Enterobacteria phage T4 | genes/BPT4/DAM/DAM-ai-review.yaml |
Phage antirestriction miscast as host innate immune suppression |
| E | Enterobacteria phage T4 | genes/BPT4/E/E-ai-review.yaml |
Phage lysis miscast as defense response to bacterium |
| frd | Enterobacteria phage T4 | genes/BPT4/frd/frd-ai-review.yaml |
Drug target miscast as antibiotic/methotrexate response |
| darB | Tequatrovirus | genes/9CAUD/darB/darB-ai-review.yaml |
DNA methyltransferase/antirestriction specificity |
| g022 | Tequatrovirus | genes/9CAUD/g022/g022-ai-review.yaml |
DNA polymerase broad transferase terms removed |
| dfrP | Bacillus phage phiNIT1 | genes/9CAUD/dfrP/dfrP-ai-review.yaml |
DHFR function mostly accepted; generic oxidoreductase removed |
| AcrF8 | Pectobacterium phage ZF40 | genes/BPZF4/AcrF8/AcrF8-ai-review.yaml |
Anti-CRISPR should use CRISPR-Cas-specific host defense term |
| ACA2 | Pectobacterium phage ZF40 | genes/BPZF4/ACA2/ACA2-ai-review.yaml |
RNA binding accepted; generic metal ion binding removed |
| M2 | Influenza A virus | genes/9INFA/M2/M2-ai-review.yaml |
Ion-channel and virion-component terms modified to specific proton-channel/localization terms |
| brt | Bordetella phage BPP-1 | genes/BPBPP/brt/brt-ai-review.yaml |
DGR reverse transcriptase function accepted |
| AimP | Bacillus phage phi3T | genes/BPPHT/AimP/AimP-ai-review.yaml |
Latency-replication decision accepted |
Manual Spot-Check Round 1
The virus-wide table above is only a prioritization queue. Following the strategy used in the organism-specific SPKW projects, the next step was to inspect row-level examples by term and clade, then classify whether the SPKW-derived term is likely acceptable, too broad, semantically inverted, or dependent on a specific viral host context.
This first manual pass sampled high-volume terms from virus.ddb on 2026-05-21:
| Sample | Rows checked | Main examples observed | Preliminary curation call |
|---|---|---|---|
| Phage lysis/host-defense terms | 40 rows for GO:0042742 and 40 rows for GO:0031640 | E/P00720; gp5 (P16009); t (P06808); rI (P13304); y13J (P39503); y13K (P39504); PHI105_00120 (Q9ZXD7); R (P68920); CPL1 (P15057); gene 16 (Q8W5T9) | Systematic over-annotation or semantic inversion. Use viral release/cytolysis or entry-through-cell-wall-disruption terms, not host defense |
| Generic DNA replication | 50 rows for GO:0006260 | g022; DAM/P04392; OPG071 (P20509/P06856); OPG148 (P20995); UL30 (P04293); UL42 (P10226); UL44 (P16790); RIR1 (P08543); RIR2 (P10224); OPG180 (P20492/P16272); gp43 (P04415); gp61 (P04520) | Split by protein role. Direct replication proteins should usually be viral genome replication; nucleotide biosynthesis and methylation enzymes should not be generic DNA replication |
| Host innate immune suppression | 70 rows for GO:0052170 | DAM/P04392; AcrF8; darB; OPG041/K3 (P20639); OPG044/K7 (P68466); A52R (P21070); OPG193 (P21004); OPG204 (P21077); ICP34.5 (P36313); US11 (P04487); TRS1 (P09695); IRS1 (P09715); UL83 (P06725); Ba71V-155 (Q65213) | Clade-specific. Often real for pox/herpes/ASFV but too broad; wrong for phage antirestriction or anti-CRISPR proteins |
| Integration and latency | 60 rows across GO:0044826 and GO:0075713 | int (P08320); 41 (G1BPP5); 69 (G1FGD4); Q9G044; PHI105_00145 (Q9T200); P30 (I7I4C5); gag-pol (P03369) | Integration is often plausible for phage integrases; "integrated proviral latency" needs careful term review for bacteriophage lysogeny |
| Virion component | 70 rows for GO:0044423 | M2/F8IZX5; MCP (P06491/P16729); CVC1 (P10201); OPG083 (P20501); OPG086 (P68459); OPG113 (P20979); OPG102 (P21033); UL44 (P16790); ICP34.5 (P36313); gag-pol (P03369) | Often true but rarely core or sufficiently specific; modify to the structural subcomponent when known, otherwise keep as non-core |
| Broad molecular-function terms | 80 rows plus bucket counts for GO:0016740, GO:0016787, GO:0016779, GO:0000166, GO:0046872, GO:0003677 | DAM/P04392; darB; g022; OPG113 (P20979); OPG102 (P21033); gag-pol (P03369); rep (Q04561); int (P08320); ICP0 (P08393); gp21 (P06807); UL44 (P16790); A085R (Q84406) | Usually MODIFY to a specific MF or KEEP_AS_NON_CORE; broad binding/enzyme-class terms should not be the core function when a precise term is available |
Sample-Level Findings
| Term pattern | Representative genes/proteins from the spot check | Manual interpretation |
|---|---|---|
| GO:0042742 defense response to bacterium on phage lysis proteins | E/P00720; gp5 (P16009); PHI105_00120 (Q9ZXD7); R (P68920); CPL1 (P15057); e (A0A346FJK3); gene 16 (Q8W5T9); G0YQ82 | REMOVE or MODIFY. The phage is not defending against bacteria; it is lysing or penetrating its bacterial host. Use GO:0044659 viral release from host cell by cytolysis, GO:0001897 symbiont-mediated cytolysis of host cell, GO:0098932 symbiont entry into host cell via disruption of host cell wall peptidoglycan, or GO:0098994 symbiont entry into host cell via disruption of host cell envelope as appropriate |
| GO:0031640 killing of cells of another organism on phage lysis proteins | E/P00720; t (P06808); rI (P13304); y13J (P39503); y13K (P39504); gp5 (P16009); Rz (Q9XJJ6); Rz1 (Q9XJJ7); Q9ZXD8; gp25 (O48471) | MARK_AS_OVER_ANNOTATED or MODIFY. Host death is a consequence of the lysis/entry mechanism, not the most informative viral process annotation |
| GO:0006260 DNA replication on viral nucleotide-supply enzymes | RIR1 (P08543); RIR2 (P10224); UL39 (A0A7T1L7L9); UL40 (Q99BK2); gp1 (P04531); nrdJ (A0A0S0N7E1) | MODIFY away from DNA replication. The supported process is GO:0009263 deoxyribonucleotide biosynthetic process, which indirectly supports replication but is not replication machinery |
| GO:0006260 DNA replication on viral polymerases, helicases, primases, and processivity factors | g022; OPG071 (P20509); OPG148 (P20995); UL30 (P04293); UL42 (P10226); UL44 (P16790); gp43 (P04415); gp41 (P04530); gp45 (P04525); gp61 (P04520) | MODIFY to GO:0039693 viral DNA genome replication when the protein directly acts in genome replication; use GO:0039686 bidirectional double-stranded viral DNA replication where that level of specificity is supported |
| GO:0006260 DNA replication on DNA methyltransferases or some ligases | DAM/P04392; darB; OPG180 (P20492); OPG180 (P16272); gp30 (P00970); agt (P04519); bgt (P04547); rnh (P13319) | High-risk. Methyltransferases should be reviewed for DNA methylation/antirestriction rather than replication. Ligases need evidence for replication fork function versus repair/recombination |
| GO:0052170 symbiont-mediated suppression of host innate immune response in pox/herpes/ASFV | OPG041/K3 (P20639); OPG044/K7 (P68466); A52R (P21070); OPG193 (P21004); OPG204 (P21077); ICP34.5 (P36313); US11 (P04487); TRS1 (P09695); IRS1 (P09715); UL83 (P06725); Ba71V-155 (Q65213) | Do not blanket-remove. Often KEEP_AS_NON_CORE or MODIFY to mechanism-specific terms such as type I interferon signaling suppression, NF-kappaB pathway suppression, IRF3/TBK1 inhibition, PKR/eIF2alpha inhibition, or interferon-mediated signaling suppression |
| GO:0052170 on phage antirestriction or anti-CRISPR proteins | DAM/P04392; AcrF8; darB; arn (P39510); stp (P62765); agt (P04519); bgt (P04547); adfA (P39229) | REMOVE or MODIFY. Use GO:0099018 symbiont-mediated evasion of host restriction-modification system or GO:0098672 symbiont-mediated suppression of host CRISPR-cas system |
| GO:0052170 on structural or core replication proteins | UL44 (P16790); UL82 (P06726); UL83 (P06725); Ba71V-060 (Q65152); Ba71V-107 (Q89424); BGLF2 (P0CK53); OPG062 (P68454) | Priority manual review. Treat as likely over-annotation unless the cited evidence directly shows immune pathway suppression by that protein |
| GO:0044826 viral genome integration into host DNA | int (P08320); gp41 (G1BPP5); gp69 (G1FGD4); Q9G044; PHI105_00145 (Q9T200); P30 (I7I4C5); gag-pol (P03369) | Often ACCEPT or KEEP_AS_NON_CORE when the protein is a bona fide integrase |
| GO:0075713 establishment of integrated proviral latency | int (P08320); gp41 (G1BPP5); gp69 (G1FGD4); Q9G044; PHI105_00145 (Q9T200); P30 (I7I4C5); gag-pol (P03369) | MODIFY or UNDECIDED pending ontology review. For bacteriophages, GO:0019042 viral latency or GO:0098689 latency-replication decision may be more appropriate than a "proviral" framing, depending on the evidence |
| GO:0044423 virion component | M2/F8IZX5; MCP (P06491); CVC1 (P10201); MCP (P16729); OPG083 (P20501); OPG086 (P68459); OPG113 (P20979); OPG102 (P21033); UL44 (P16790); ICP34.5 (P36313); gag-pol (P03369) | MODIFY to specific CC terms when known: GO:0019028 viral capsid, GO:0019031 viral envelope, GO:0019033 viral tegument, GO:0055036 virion membrane, or another specific structural component. For packaged enzymes, keep virion-component status non-core and emphasize their enzymatic function |
| Broad MF terms on characterized enzymes | DAM/P04392; darB; g022; OPG113 (P20979); OPG102 (P21033); gag-pol (P03369); rep (Q04561); int (P08320); ICP0 (P08393); gp21 (P06807); UL44 (P16790); A085R (Q84406) | MODIFY to specific MF terms where available. Examples include GO:0003887 DNA-directed DNA polymerase activity, GO:0003899 DNA-directed RNA polymerase activity, GO:0009008 DNA-methyltransferase activity, GO:0009007 site-specific DNA-methyltransferase (adenine-specific) activity, GO:0015667 site-specific DNA-methyltransferase (cytosine-N4-specific) activity, GO:0004672 protein kinase activity, GO:0004674 protein serine/threonine kinase activity, GO:0004386 helicase activity, GO:0008233 peptidase activity, GO:0004518 nuclease activity, GO:0004519 endonuclease activity, or GO:0008907 integrase activity |
Term-Level Working Rules
These are not final global edits; they are rules for the next gene-level review batches.
| SPKW-derived term | Current manual assessment | Review rule |
|---|---|---|
| GO:0042742 defense response to bacterium | High-confidence systematic problem in phage lysis/entry proteins | Remove or replace unless there is direct evidence that the viral protein mediates defense against bacteria rather than host lysis or entry |
| GO:0031640 killing of cells of another organism | Usually over-broad for normal phage lysis-cycle proteins | Prefer viral release by cytolysis or entry/cell-wall-disruption terms; reserve killing for true toxin-like biology |
| GO:0052170 symbiont-mediated suppression of host innate immune response | Correctness depends on host and mechanism | Remove from phage antirestriction/anti-CRISPR cases; make pathway-specific in pox, herpes, ASFV, flavivirus, and influenza cases where evidence supports immune suppression |
| GO:0006260 DNA replication | Too generic for many viral proteins | Use viral genome replication for direct replication machinery; do not use for nucleotide biosynthesis, DNA methylation, or generic DNA-binding proteins |
| GO:0044423 virion component | Frequently true but too broad and often non-core | Keep as non-core or replace with a specific virion subcomponent term |
| GO:0044826 viral genome integration into host DNA | Often valid for integrases | Accept only when protein is an integrase/recombinase with host-genome integration evidence |
| GO:0075713 establishment of integrated proviral latency | Potentially misframed for bacteriophage lysogeny and wrong for episomal herpes latency | Review per clade; prefer viral latency or latency-replication decision when those terms match the biology better |
| Broad MF terms such as hydrolase activity, transferase activity, nucleotide binding, metal ion binding, DNA binding | High-volume specificity loss | Do not make core unless no better term exists; usually modify to the specific enzymatic or binding activity |
Manual Spot-Check Round 2
The second manual pass expanded coverage beyond the highest-risk terms. It used raw SPKW counts, per-term row samples, and clade-specific profiles for poxviruses, ASFV, herpesviruses, giant dsDNA viruses, RNA viruses, retroviruses, phages, adenoviruses, papillomaviruses, and cyanophage/algal-virus auxiliary metabolism.
Expanded Coverage
| Review slice | Terms checked by label | Representative SPKW rows reviewed | Preliminary curation call |
|---|---|---|---|
| Structural and entry terms | viral capsid; virion attachment; virus membrane fusion; endocytosis involved in entry; pilus attachment; procapsid maturation; phage tail assembly/fiber/baseplate; genome ejection | HSV-1 MCP/P06491, CVC1/P10201, gB/P10211, gC/P10228; HCMV MCP/P16729; pox OPG099/M1LBP0, OPG108/A0A7H0DN80, OPG086/P68459, OPG105/P20508; coronavirus S/P11223; influenza HA/F8IZX3; phage III/P03663, A/P07394, T4 10/P10928, 35/P03742, 18/P13332, 21/P06807 | Mostly legitimate SPKW contributions. Main risk is granularity: keep structural/process terms on the actual structural or assembly proteins, and avoid treating broad virion component as core |
| RNA-virus and retrovirus replication terms | RNA-directed RNA polymerase; RNA-directed DNA polymerase; 7-methylguanosine mRNA capping; proteolysis/peptidase; translational frameshifting; ion transport; viral budding | HIV-1 gag-pol/P03369; pox OPG102/P21033, OPG113/P20979, OPG124/P20980, OPG083/P20501; influenza M2/F8IZX5; PBCV-1 A250R/Q84568; Megavirus mchi_633/G5CQF3; vesicular stomatitis M/P08671; arenavirus Z/P18541; arterivirus rep/Q04561 | Often correct, but polyprotein granularity is a major issue. The function may belong to one mature product or domain rather than the whole precursor |
| Host-process modulation | type I interferon suppression; NF-kappaB suppression; IRF3 inhibition; JAK-STAT/STAT1 inhibition; PKR/eIF2alpha suppression; antigen presentation suppression; translation initiation suppression; host cell-cycle/apoptosis/autophagy/ubiquitin-like modification | Pox OPG029/P17362, OPG041/P20639, OPG044/P68466, A52R/P21070; HSV-1 ICP34.5/P36313, US11/P04487, ICP0/P08393, US12/P03170; HCMV US6/P14334, UL44/P16790; EBV BNLF2a/P0C739; ASFV Ba71V-060/Q65152; HPV E7/Q30BJ2, E6/Q30BJ3; arterivirus rep/Q04561 | Many are biologically plausible in eukaryotic viruses, but they should be reviewed for direct mechanism and mature-product assignment. Structural or core replication proteins with immune terms need priority review |
| Auxiliary metabolism and photosynthesis | photosynthesis; chlorophyll binding; one-carbon metabolism; pyridine nucleotide biosynthesis; aminoacyl-tRNA ligase activity; lipid metabolism; dioxygenase activity | Cyanophage/algal-virus psbA/A0A345AWS2, psbD/H6WG48, psaA/A0A0K0KVP2, psaB/A0A0K0KW31; T4 frd/P04382; phage 116/G1FGI1; Megavirus mchi_350/G5CSM6, mchi_737/G5CRA4, mchi_435/G5CT84; T6-like vs/A0A346FJK0; PBCV-1 A085R/Q84406 | Not automatically wrong for viruses. Photosystem proteins look mostly legitimate; DHFR/NAMPT/lipid/dioxygenase terms are often broad and should be replaced with specific MF or pathway terms where possible |
| Integration and latency | viral genome integration into host DNA; DNA integration; RNA-directed DNA polymerase; establishment of integrated proviral latency | HIV-1 gag-pol/P03369; phage lambda int/P08320; additional sampled phage integrase rows included 41/G1BPP5, 69/G1FGD4, Q9G044, PHI105_00145/Q9T200, and P30/I7I4C5 | Viral genome integration is often correct. "Integrated proviral latency" should be accepted for retroviruses but reviewed carefully for bacteriophage integrases |
Representative Rows Reviewed in Round 2
These rows are examples from the SPKW records inspected for this round. They are not proposed final gene reviews; they show the evidence trail behind the term-level calls.
| Slice | Taxon/clade | Gene / UniProt | Protein name | SPKW term labels checked | Manual note |
|---|---|---|---|---|---|
| Structural | HSV-1 | MCP/P06491 | Major capsid protein | viral capsid; T=16 icosahedral viral capsid; virion component | ACCEPT capsid; demote broad virion component if a specific capsid term is retained |
| Structural | HSV-1 | CVC1/P10201 | Capsid vertex component 1 | viral capsid; virion component | ACCEPT/MODIFY to specific capsid vertex component if available |
| Entry | HSV-1 | gB/P10211 | Envelope glycoprotein B | viral envelope; virion attachment; symbiont entry | Plausible entry protein; prefer fusion/entry mechanism terms when supported |
| Entry | HSV-1 | gC/P10228 | Envelope glycoprotein C | virion attachment; adhesion receptor-mediated attachment; complement suppression | Attachment term looks strong; host complement term needs mechanism-specific evidence |
| Structural | HCMV | MCP/P16729 | Major capsid protein | viral capsid; T=16 icosahedral viral capsid; virion component | ACCEPT capsid; broad virion component is non-core |
| Entry | Monkeypox | OPG099/M1LBP0 | Entry-fusion complex associated protein | viral envelope; virion attachment; symbiont entry | Plausible, but entry/fusion role should be protein-specific |
| Entry | Monkeypox | OPG108/A0A7H0DN80 | Envelope protein OPG108 | viral envelope; virion attachment; symbiont entry | Plausible attachment/envelope row |
| Entry | Vaccinia Copenhagen | OPG086/P68459 | Entry-fusion complex protein | fusion of virus membrane with host plasma membrane; membrane fusion involved in viral entry | Stronger than generic entry; likely ACCEPT |
| Entry | Vaccinia Copenhagen | OPG105/P20508 | Cell surface-binding protein | viral envelope; virion attachment; symbiont entry | Good candidate for attachment-specific annotation |
| Entry | Coronavirus | S/P11223 | Spike glycoprotein | virion attachment; endocytosis involved in entry; virus-endosome membrane fusion | Plausible, but exact entry route is virus/strain dependent |
| Entry | Influenza | HA/F8IZX3 | Hemagglutinin | virion attachment; endocytosis involved in entry; virus-endosome membrane fusion | Strong entry/fusion example; broad virion component is non-core |
| Phage entry | Filamentous phage | III/P03663 | Attachment protein G3P | pilus attachment; receptor-mediated attachment; virion component | ACCEPT pilus attachment; this is a good SPKW contribution |
| Phage entry | RNA phage | A/P07394 | Maturation protein A | pilus attachment; virion attachment; viral genome circularization | Pilus attachment is plausible; genome circularization needs separate evidence |
| Phage structure | T4 | 10/P10928 | Baseplate wedge protein gp10 | viral tail assembly; virus tail; tail baseplate; virion component | ACCEPT tail/baseplate terms; avoid generic virion component as core |
| Phage structure | T4 | 35/P03742 | Long-tail fiber protein gp35 | tail fiber; adhesion receptor-mediated attachment; carbohydrate binding | ACCEPT tail fiber/attachment; broad entry can be less informative |
| Phage entry | T4 | 18/P13332 | Tail sheath protein | tail sheath; genome ejection through host envelope; symbiont entry | Good phage-specific genome-ejection term |
| Phage assembly | T4 | 21/P06807 | Prohead core protein protease | viral procapsid maturation; proteolysis; peptidase activity | ACCEPT procapsid maturation; replace broad proteolysis with specific protease terms where possible |
| Retrovirus | HIV-1 | gag-pol/P03369 | Gag-Pol polyprotein | RNA-directed DNA polymerase; DNA integration; integrated proviral latency; proteolysis; frameshifting | Retroviral integration/latency terms are expected; polyprotein/domain granularity still matters |
| Capping | Vaccinia Copenhagen | OPG102/P21033 | Cap-specific mRNA methyltransferase | 7-methylguanosine mRNA capping; methyltransferase; methylation | Capping is plausible; broad methylation/transferase should be more specific |
| Capping | Vaccinia Copenhagen | OPG113/P20979 | mRNA-capping enzyme catalytic subunit | mRNA capping; GTP binding; methyltransferase; nucleotidyltransferase | Strong capping enzyme; broad MF terms should be modified to specific activities |
| Capping | Vaccinia Copenhagen | OPG124/P20980 | mRNA-capping enzyme regulatory subunit | mRNA capping; mRNA processing; transcription termination | Capping role plausible; regulatory subunit should not inherit every catalytic MF |
| Protease | Vaccinia Copenhagen | OPG083/P20501 | Core protease | proteolysis; peptidase; cysteine-type peptidase; virion component | Specific peptidase term is better than generic proteolysis |
| Ion channel | Influenza | M2/F8IZX5 | Matrix protein 2 | ion transport; channel activity; proton transmembrane transport; virion component | MODIFY generic ion transport to proton-channel/proton-transport terms |
| Ion channel | PBCV-1 | A250R/Q84568 | Potassium channel protein Kcv | ion transport; channel activity; ion transmembrane transport | ACCEPT channel biology; prefer potassium-specific terms |
| Ion transport | Megavirus | mchi_633/G5CQF3 | BTB domain-containing potassium-channel-like protein | ion transport; ion transmembrane transport | Plausible only with channel evidence; generic ion transport is non-core |
| Budding | Vesicular stomatitis virus | M/P08671 | Matrix protein | viral budding; ESCRT-mediated budding; structural constituent of virion | Budding plausible; keep as life-cycle role, not MF |
| Budding | Arenavirus | Z/P18541 | RING finger protein Z | viral budding; ESCRT-mediated budding; zinc/metal ion binding | Budding plausible; broad metal binding is not core |
| Immune evasion | Vaccinia WR | OPG029/P17362 | IFN signaling evasion protein | type I interferon suppression; TLR/TBK1 pathway suppression; innate immune suppression | Specific pathway terms are better than broad innate immune suppression |
| Immune evasion | Vaccinia Copenhagen | OPG041/P20639 | Protein K3 | type I interferon suppression; PKR/eIF2alpha suppression; innate immune suppression | Strong immune-evasion example; keep specific PKR/interferon terms |
| Immune evasion | Vaccinia WR | OPG044/P68466 | Protein K7 | NF-kappaB suppression; innate immune suppression | Plausible; prefer NF-kappaB-specific term |
| Immune evasion | Vaccinia Copenhagen | A52R/P21070 | Protein A52 | TRAF/NF-kappaB suppression; innate immune suppression | Plausible pathway-specific immune modulator |
| Immune evasion | HSV-1 | ICP34.5/P36313 | Neurovirulence factor ICP34.5 | type I interferon suppression; IRF3 inhibition; translation initiation suppression; autophagy suppression | Multiple host-process terms plausible but require mechanism-specific evidence |
| Immune evasion | HSV-1 | US11/P04487 | Accessory factor US11 | type I interferon suppression; RIG-I/MDA5/TBK1 suppression; PKR/eIF2alpha suppression | Stronger than broad innate immune suppression; track exact pathway |
| Host ubiquitin/cell cycle | HSV-1 | ICP0/P08393 | E3 ubiquitin-protein ligase ICP0 | ubiquitin-like modification; cell-cycle perturbation; viral latency; IRF3 inhibition | Plausible but broad; review per pathway and avoid treating every host consequence as core |
| Antigen presentation | HSV-1 | US12/P03170 | ICP47 protein | adaptive immune suppression; antigen processing/presentation suppression | Low-volume, strong mechanism-specific SPKW row |
| Antigen presentation | HCMV | US6/P14334 | Unique short US6 glycoprotein | antigen processing/presentation suppression | Low-volume, strong mechanism-specific SPKW row |
| Antigen presentation | EBV | BNLF2a/P0C739 | Protein BNLF2a | antigen processing/presentation suppression | Low-volume, strong mechanism-specific SPKW row |
| Priority review | HCMV | UL44/P16790 | DNA polymerase processivity factor | viral DNA genome replication; IRF3 inhibition; NF-kappaB suppression; innate immune suppression; virion component | Replication term is strong; immune terms need direct support and may be overannotations |
| Priority review | ASFV | Ba71V-060/Q65152 | Minor capsid protein M1249L | IRF3 inhibition; innate immune suppression; virion component | Structural protein with immune terms: high-priority manual review |
| Cell cycle | HPV | E7/Q30BJ2 | E7 protein | G1/S checkpoint perturbation; host cell-cycle progression; type I interferon suppression | Cell-cycle term plausible; immune term needs separate mechanism |
| Apoptosis | HPV | E6/Q30BJ3 | Protein E6 | host apoptosis perturbation; innate immune suppression | Apoptosis term plausible; broad immune term needs review |
| Polyprotein risk | Arterivirus | rep/Q04561 | Replicase polyprotein 1ab | RdRP; helicase; proteolysis; autophagy activation; JAK-STAT, PKR, NF-kappaB suppression; frameshifting | Good example of polyprotein overloading; assign to mature domains/products |
| Photosynthesis | Cyanophage/algal virus | psbA/A0A345AWS2 | Photosystem II D1 protein | photosynthesis; chlorophyll binding; metal ion binding | Photosynthesis/chlorophyll terms look legitimate; broad binding is non-core |
| Photosynthesis | Cyanophage/algal virus | psbD/H6WG48 | Photosystem II D2 protein | photosynthesis; chlorophyll binding; metal ion binding | Legitimate photosystem row |
| Photosynthesis | Algal virus | psaA/A0A0K0KVP2 | Photosystem I P700 apoprotein A1 | photosynthesis; chlorophyll binding; iron-sulfur binding | Legitimate photosystem row; broad oxidoreductase should be reviewed |
| Photosynthesis | Algal virus | psaB/A0A0K0KW31 | Photosystem I protein | photosynthesis; chlorophyll binding; iron-sulfur binding | Legitimate photosystem row; broad oxidoreductase should be reviewed |
| Auxiliary metabolism | T4 | frd/P04382 | Dihydrofolate reductase | one-carbon metabolism; oxidoreductase; methotrexate response; antibiotic response | DHFR activity is real; drug-response terms are wrong for the viral enzyme |
| Auxiliary metabolism | Phage | 116/G1FGI1 | Nicotinamide phosphoribosyltransferase | pyridine nucleotide biosynthesis; glycosyltransferase/transferase | Pathway plausible; MF should be more specific than broad transferase |
| Auxiliary metabolism | Megavirus | mchi_350/G5CSM6 | Isoleucine--tRNA ligase | aminoacyl-tRNA ligase; ATP/nucleotide binding | Likely true aaRS; broad binding/ligase terms are non-core |
| Auxiliary metabolism | Megavirus | mchi_737/G5CRA4 | Asparagine--tRNA ligase | aminoacyl-tRNA ligase; ATP/nucleotide binding | Likely true aaRS; broad binding/ligase terms are non-core |
| Modifier risk | T6-like phage | vs/A0A346FJK0 | Valyl-tRNA synthetase modifier | aminoacyl-tRNA ligase activity | Likely overannotation: a synthetase modifier is not automatically a ligase |
| Lipid metabolism | Megavirus | mchi_435/G5CT84 | PNPLA domain-containing protein | lipid metabolism; lipid catabolism; hydrolase activity | Plausible lipid enzyme, but broad lipid metabolism/hydrolase should be made specific |
| Dioxygenase | PBCV-1 | A085R/Q84406 | Fe2OG dioxygenase domain-containing protein | dioxygenase; oxidoreductase; metal ion binding | MF plausible but broad; needs specific dioxygenase activity if known |
| Phage cell wall | Bacillus phage phi105 | PHI105_00120/Q9ZXD7 | N-acetylmuramoyl-L-alanine amidase | cell wall organization; killing cells; defense response to bacterium; hydrolase | MODIFY to peptidoglycan catabolism or amidase activity; defense term is wrong perspective |
| Phage cell wall | Streptococcus phage | SmphiM12_144/S5M6U0 | Murein endopeptidase K | cell wall organization; proteolysis; metallopeptidase | MODIFY to specific peptidoglycan/murein cleavage activity |
| Integration | Phage lambda | int/P08320 | Integrase | DNA integration; viral genome integration into host DNA; integrated proviral latency | Integration likely ACCEPT; "proviral latency" is the questionable bacteriophage framing |
Manual Spot-Check Round 3: ViralZone-Primary Keywords
This pass focused on the ViralZone-primary subset. The goal was to test whether VZ-primary status behaves like a useful confidence flag and to identify failure modes that remain even when the keyword is ViralZone-backed.
| VZ / KW | GO term from KW mapping | Rows in virus.ddb |
Representative rows reviewed | What the samples showed | Preliminary curation call |
|---|---|---|---|---|---|
| VZ-3952 / KW-1243 Viral long flexible tail ejection system | GO:0099001 symbiont genome ejection through host cell envelope, long flexible tail mechanism | 36 annotations on 36 proteins; all Swiss-Prot | O21883; O21879/O21879; O21884/O21884; V/P03733; H/P03736; J/P03749 | O21883 is a Lactococcus phage SK1 distal tail protein. UniProt states it forms the distal tail, self-associates as rings, and has a central channel allowing DNA ejection. The sampled rows are portal, tail tube, tape-measure, tail-tip, or baseplate proteins from long-tailed phages | ACCEPT the specific ejection term on tail/ejection apparatus proteins. KEEP_AS_NON_CORE or MODIFY broad symbiont entry into host cell and virion component if more specific terms are present |
| VZ-3950 / KW-1242 Viral contractile tail ejection system | GO:0099000 symbiont genome ejection through host cell envelope, contractile tail mechanism | 363 annotations on 363 proteins | 18/P13332; 18/P13332 with virus tail sheath; related contractile-tail rows in T4-like phages | 18/P13332 is T4 tail sheath protein gp18. UniProt describes the contractile sheath and says contraction drives the central tube through the host outer membrane, creating a channel for DNA ejection | ACCEPT the contractile-tail ejection term on sheath/tube/baseplate proteins. This is a good example where VZ-primary is more precise than generic entry |
| VZ-3943 / KW-1233 Viral attachment to host adhesion receptor | GO:0098671 adhesion receptor-mediated virion attachment to host cell | 228 annotations on 228 proteins | III/P03663 plus related filamentous/RNA phage attachment proteins | III/P03663 is attachment protein G3P. UniProt says it mediates adsorption to the host F-pilus and subsequent entry-receptor interaction; the SPKW row also carries pilus attachment and entry-receptor attachment terms | ACCEPT specific attachment terms where receptor biology is known. Avoid making generic symbiont entry into host cell the main curation outcome when pilus/receptor attachment is available |
| VZ-3966 / KW-1258 Restriction-modification system evasion by virus | GO:0099018 symbiont-mediated evasion of host restriction-modification system | 128 annotations on 128 proteins | DAM/P04392; many TrEMBL cytosine or adenine methyltransferases | DAM/P04392 is a T4 DNA adenine methylase. UniProt says it methylates the GATC motif and may prevent degradation of viral DNA by host restriction-modification antiviral defense. The VZ-primary term captures the phage-specific mechanism better than the broad innate-immune parent |
ACCEPT/MODIFY to restriction-modification evasion for bona fide viral DNA methyltransferases. REMOVE the coexisting non-VZ GO:0052170 symbiont-mediated suppression of host innate immune response on phage proteins |
| VZ-883 / KW-1114 Inhibition of host interferon signaling pathway by virus | GO:0039502 symbiont-mediated suppression of host type I interferon-mediated signaling pathway | 492 annotations on 492 proteins | US11/P04487 and related herpes/pox/flavivirus immune modulators | US11/P04487 is a HSV-1 accessory factor with UniProt function text supporting PKR inhibition, RIG-I/MDA5 interaction, TBK1 destabilization, and inhibition of host immune response. However, the same entry carries many pathway-specific VZ-primary KWs plus broad DNA/RNA-binding and innate-immune terms | Usually ACCEPT pathway-specific VZ-primary immune terms when direct mechanism is present, but review term stacking. The broad non-VZ innate immune parent should not be treated as more informative than RIG-I/MDA5/PKR/TBK1/interferon-specific terms |
VZ-Primary Working Assessment
This round supports the stratification rule above. The VZ-primary rows sampled here were mostly stronger than generic SPKW rows because they encode a viral mechanism: long-tail ejection, contractile-tail ejection, receptor attachment, restriction-modification evasion, or interferon-pathway inhibition. The remaining risks are narrower:
| Risk | Example from this round | Review response |
|---|---|---|
| Broad parent co-annotation still present | O21883 has specific GO:0099001 plus generic entry/virion terms | Keep the specific term as core; demote or replace broad parent terms |
| Non-VZ broad keyword can coexist with good VZ keyword | DAM/P04392 has good GO:0099018 plus bad GO:0052170 | Use VZ-primary term as replacement target; remove phage innate-immune parent |
| Multiple VZ-primary immune terms can stack on one protein | US11/P04487 carries RIG-I, MDA5, PKR, TBK1, TLR, interferon, and autophagy suppression rows | Do not blanket-accept all children; check which mechanisms are directly supported |
| VZ keyword category may not match GO aspect | KW-1243 is a UniProt "Molecular function" keyword but maps to a GO biological-process ejection term | Judge the GO term, not the UniProt keyword category label |
Manual Spot-Check Round 4: More Granular ViralZone Terms
This pass deliberately sampled lower-level VZ-primary terms where the term name already describes a specific mechanism, substrate, host pathway component, or entry route. Counts are raw SPKW IEA rows in virus.ddb.
| VZ / KW | GO term from KW mapping | Rows in virus.ddb |
Representative rows reviewed | What the samples showed | Preliminary curation call |
|---|---|---|---|---|---|
| VZ-817 / KW-1107 Inhibition of host TAP by virus | GO:0039588 symbiont-mediated suppression of host antigen processing and presentation | 7 annotations on 7 proteins; all Swiss-Prot | US12/P03170; BNLF2a/P0C739; US6/P14334 | US12/P03170 binds host TAP and blocks peptide binding/translocation; BNLF2a/P0C739 prevents TAP-mediated peptide transport; US6/P14334 blocks TAP ATP binding and peptide translocation. These are direct host-factor inhibition proteins, not generic immune modulators | ACCEPT the TAP-specific term. The broad coannotation GO:0039504 symbiont-mediated suppression of host adaptive immune response is less informative and should usually be KEEP_AS_NON_CORE or replaced by antigen-processing terms |
| VZ-957 / KW-1165, VZ-976 / KW-1166, VZ-978 / KW-1167 entry-route endocytosis terms | GO:0075512 clathrin-mediated; GO:0075513 caveolin-mediated; GO:0019065 clathrin/caveolin-independent receptor-mediated endocytosis | 89, 10, and 21 annotations | HA/P03452; S/P17400; Cap/O56129 | HA/P03452 explicitly describes clathrin-dependent and clathrin/caveolin-independent internalization; S/P17400 says caveolin-mediated internalization; Cap/O56129 says clathrin-, caveolae-, and dynamin-independent entry. These rows are stronger than generic endocytosis when the route is named in the protein record | ACCEPT route-specific terms on entry proteins when the route is explicit for that virus/protein. Keep broad endocytosis involved in viral entry and symbiont entry into host cell as non-core or redundant. Do not transfer route-specific terms to structural bystanders without entry evidence |
| VZ-979 / KW-1172 Pore-mediated penetration of viral genome into host cell | GO:0044694 symbiont genome entry into host cell via pore formation in plasma membrane | 11 annotations on 11 proteins | 26/P35837; 9/P04331; V/P31340; D18-19/Q6QGE7 | 26/P35837 is a P22 tail needle/plug protein; UniProt says host membrane perforation allows viral DNA ejection and that the needle penetrates the host outer membrane. The sampled set is tail needle, tail knob, spike, tape-measure, and selected polyprotein rows | ACCEPT on the actual pore/needle/plug apparatus. Review polyprotein rows carefully: the term may be correct for a mature product but too specific for an uncleaved polyprotein record unless product-level evidence is clear |
| VZ-3940 / KW-1236, VZ-3939 / KW-1237, VZ-3896 / KW-1238 host-envelope substrate degradation during entry | GO:0098932 peptidoglycan disruption; GO:0098995 lipopolysaccharide disruption; GO:0098996 glycocalyx disruption | 97, 8, and 133 annotations | 5/P16009; 9/P12528; GP90/P49714; 37/D1L2X0 | 5/P16009 is T4 baseplate spike gp5 with lysozyme activity for local peptidoglycan hydrolysis; 9/P12528 is P22 tail spike with O-antigen LPS endorhamnosidase activity; GP90/P49714 is a K1 capsule depolymerase/endosialidase. The VZ terms preserve the substrate distinction that generic entry and generic hydrolase terms lose | ACCEPT the substrate-specific entry-disruption terms on enzymatic tail/spike/depolymerase proteins. MODIFY broad MF terms to specific lysozyme, endorhamnosidase, or endosialidase activities when available. REMOVE broad killing of cells and defense response to bacterium rows on these phage entry proteins |
| VZ-3964 / KW-1252 Latency-replication decision | GO:0098689 latency-replication decision | 6 annotations on 6 proteins | aimR/O64094; aimP/O64095; cI/P03034; cro/P03040 | aimR/O64094 and aimP/O64095 are the arbitrium switch regulator/peptide pair; cI/P03034 maintains lambda latency; cro/P03040 promotes the lytic transition. The rows are specific switch components rather than generic latency-associated proteins | ACCEPT on switch regulators, anti-repressors, and arbitrium peptides with direct switch biology. Do not use VZ-primary status to annotate all latency proteins with the decision term |
| VZ-3963 / KW-1256 DNA end degradation evasion by virus | GO:0099016 symbiont-mediated evasion of DNA end degradation by host | 6 annotations on 6 proteins; all Swiss-Prot | gam/P03702; 2/P15076; N/P08557; 206/Q853W0 | gam/P03702 inhibits host RecBCD and protects viral DNA; 2/P15076 binds T4 DNA ends and protects against RecBCD degradation. These are mechanistically sharper than the broad phage innate-immune row that coexists on both examples | ACCEPT DNA-end evasion on nuclease inhibitors and DNA-end protection proteins. REMOVE or MODIFY the broad non-VZ GO:0052170 symbiont-mediated suppression of host innate immune response on these phage proteins |
| VZ-3962 / KW-1257 CRISPR-cas system evasion by virus | GO:0098672 symbiont-mediated suppression of host CRISPR-cas system | 3 annotations on 3 proteins; all Swiss-Prot | orf30/Q6TM72; orf31/Q6TM71; agt/P04519 | orf30/Q6TM72 and orf31/Q6TM71 are named anti-CRISPR proteins; agt/P04519 modifies T4 DNA and UniProt states the modification protects against host CRISPR-Cas9 as well as restriction systems. This is a good example where a VZ-primary term can coexist with a second precise VZ-primary antirestriction term | ACCEPT for bona fide anti-CRISPR proteins and supported DNA-modification evasion proteins. Prefer the specific CRISPR and restriction-modification evasion terms over broad adaptive/innate immune parents |
| VZ-1536 / KW-1187 Viral budding via the host ESCRT complexes | GO:0039702 viral budding via host ESCRT complex | 101 annotations on 101 proteins | M/P08325; M/P03519; M/P16629; M/P08671 | M/P08325 is a VSV matrix protein; UniProt says it recruits cellular ESCRT partners for release of budding particles. The sampled rows are mainly matrix proteins from negative-strand RNA viruses and overlap with generic viral budding and virion-component rows | ACCEPT ESCRT budding on matrix/late-domain proteins with ESCRT recruitment. Keep generic viral budding as a parent/non-core term when the ESCRT child is present; do not treat virion component as functional evidence |
Round 4 Granularity Takeaways
| Pattern | Examples | Curation response |
|---|---|---|
| Very low-count VZ terms are often high signal | US12/P03170, gam/P03702, orf30/Q6TM72 | Good candidates for confident ACCEPT when the UniProt function text names the host factor or defense pathway |
| Granular entry-route terms are useful but clade-sensitive | HA/P03452, S/P17400, Cap/O56129 | Accept only when the route is explicit for that virus/protein; otherwise keep generic entry/endocytosis terms conservative |
| Substrate-specific phage entry terms fix broad-term noise | 5/P16009, 9/P12528, GP90/P49714 | Prefer peptidoglycan/LPS/glycocalyx disruption plus precise MF terms over generic hydrolase, killing, or defense-response terms |
| Precise VZ terms can coexist legitimately on one protein | agt/P04519 has CRISPR-cas evasion and restriction-modification evasion | Keep both when the mechanisms are separately supported, but demote broad adaptive/innate immune parents |
Additional Term-Level Findings
| Term pattern | Count in raw virus SPKW | Manual interpretation |
|---|---|---|
| GO:0019028 viral capsid | 5,401 | Mostly ACCEPT for named capsid, coat, hexon, triplex, and capsid vertex proteins. MODIFY only when a more specific capsid subcomponent is available or when a polyprotein record obscures the mature product |
| GO:0019062 virion attachment to host cell | 1,980 | Usually legitimate for pox cell-surface binding proteins, herpes glycoproteins, phage tail fibers, and coronavirus/influenza entry proteins. Prefer specific attachment/fusion/endocytosis terms when known |
| GO:0039663 membrane fusion involved in viral entry into host cell | 583 | Stronger than the generic entry term for pox entry-fusion complex proteins and enveloped-virus fusion proteins; generally ACCEPT/MODIFY from generic entry |
| GO:0039666 virion attachment to host cell pilus | 478 | Mostly ACCEPT for filamentous/RNA phage maturation or attachment proteins such as G3P and maturation protein A |
| GO:0098003 viral tail assembly, GO:0098024 virus tail fiber, GO:0098025 virus tail baseplate | 1,190 / 173 / 163 | Good phage-specific terms for T4-like and other tailed phage structural proteins. Do not collapse these to generic virion component |
| GO:0099000 and GO:0099002 genome ejection through host envelope | 363 / 533 | Mostly plausible for tail sheath, portal, internal virion, tail needle, and peptidoglycan-disrupting proteins. Prefer these over generic host-cell entry where mechanism is known |
| GO:0046797 viral procapsid maturation | 299 | Mostly ACCEPT for prohead proteases and maturation factors; keep distinct from generic proteolysis |
| GO:0003968 RNA-directed RNA polymerase activity | 2,526 | Mostly ACCEPT for RdRP/replicase proteins. 251 rows are on polyprotein records, so mature-product granularity must be tracked |
| GO:0003964 RNA-directed DNA polymerase activity | 188 | ACCEPT for retroviral/retroelement reverse transcriptase domains; do not confuse with RNA-directed RNA polymerase |
| GO:0006370 7-methylguanosine mRNA capping | 641 | Often ACCEPT for pox capping enzyme subunits and RNA-virus replicase/capping domains; review subunit/domain specificity |
| GO:0006508 proteolysis and GO:0008233 peptidase activity | 2,639 each | Often legitimate for viral proteases, but too broad when a specific peptidase class exists. Polyprotein records need mature protease assignment |
| GO:0006811 monoatomic ion transport and GO:0034220 monoatomic ion transmembrane transport | 176 / 167 | Split by protein. Influenza M2 and Kcv-like channels should use specific ion-channel terms; polyproteins, VP2, and agnoproteins need review before accepting generic ion transport |
| GO:0046755 viral budding | 101 | Often plausible for matrix/Z proteins and some pox envelope proteins, but should be non-core for proteins whose main role is structural or enzymatic |
| GO:0044071 host cell-cycle progression and GO:0039645 G1/S checkpoint perturbation | 445 / 316 | Often plausible for HPV E7 and some herpes immediate-early proteins. High-risk on mRNA export factors, tegument proteins, or DNA processivity factors unless direct cell-cycle evidence exists |
| GO:0052150 host apoptosis perturbation | 437 | Plausible for pox apoptosis regulators and viral oncogenes. Prefer mechanism-specific regulation terms where evidence supports them |
| GO:0039657 suppression of host gene expression | 295 | Strong for virion host-shutoff proteins and known transcription/translation antagonists. 80 rows are polyprotein records, which should be reviewed at mature-product level |
| GO:0039520 activation of host autophagy | 91 | High polyprotein burden: 77/91 rows are polyprotein or replicase-like records. Treat as a priority RNA-virus granularity issue |
| GO:0039648 perturbation of host ubiquitin-like protein modification | 251 | ACCEPT/MODIFY for viral E3 ligases and tegument deneddylases; review broad assignment to polyproteins or proteins with only domain-level evidence |
| GO:0039588 suppression of host antigen processing and presentation | 7 | Low volume and mostly strong examples: herpes ICP47, HCMV US6, and EBV BNLF2a-like proteins |
| GO:0015979 photosynthesis and GO:0016168 chlorophyll binding | 123 / 117 | Mostly legitimate for viral photosystem I/II proteins such as PsbA, PsbD, and Psa subunits. Watch rare non-photosystem enzyme cases |
| GO:0006730 one-carbon metabolic process | 141 | Mostly viral DHFRs. MODIFY from broad one-carbon metabolism to GO:0004146 dihydrofolate reductase activity and GO:0046654 tetrahydrofolate biosynthetic process where supported |
| GO:0019363 pyridine nucleotide biosynthetic process | 128 | Mostly nicotinamide phosphoribosyltransferases. Likely plausible, but should be replaced with a more specific NAD salvage/biosynthesis term if available |
| GO:0004812 aminoacyl-tRNA ligase activity | 134 | Split case. True giant-virus aminoacyl-tRNA synthetases can be accepted; "valyl-tRNA synthetase modifier" proteins are not automatically aminoacyl-tRNA ligases and need review |
| GO:0071555 cell wall organization | 173 | Mostly phage amidases/autolysins. Usually MODIFY to GO:0009253 peptidoglycan catabolic process or specific activities such as GO:0008745 N-acetylmuramoyl-L-alanine amidase activity, GO:0003796 lysozyme activity, GO:0008932 lytic endotransglycosylase activity, or GO:0008933 peptidoglycan lytic transglycosylase activity |
Clade Calls After Round 2
| Clade | Representative taxa checked | Main call |
|---|---|---|
| Poxviridae and Asfarviridae | Monkeypox, vaccinia Copenhagen/WR, variola, camelpox, ASFV BA71V | Immune-evasion terms are often real, especially type I interferon, NF-kappaB, PKR/eIF2alpha, apoptosis, and ubiquitin-like modification. Structural/capsid proteins with immune-suppression terms remain high-priority manual review |
| Herpesviruses | HSV-1/2, HCMV AD169/Merlin, EBV B95-8/AG876, HHV8, herpesvirus saimiri | Capsid/envelope terms are generally good. Host shutoff, antigen presentation, PKR, IRF3, and cell-cycle terms should be accepted only with protein-specific mechanism; latency terms should not be conflated with integration |
| Giant dsDNA viruses | Mimivirus, Megavirus, Moumouvirus, Tupanvirus, Fadolivirus, Yasminevirus, Bodo saltans virus, PBCV-1 | Main issue is specificity loss: broad hydrolase, transferase, kinase, ligase, lipid metabolism, dioxygenase, and virion component terms dominate. True auxiliary metabolic enzymes should be curated with precise MF/process terms |
| RNA viruses | Alphaviruses, coronaviruses, arteriviruses, influenza-like examples | RdRP, capping, spike/HA entry, budding, and M2-like ion-channel terms can be correct. Polyprotein records are the major risk because one UniProt entry can carry capsid, entry, protease, polymerase, channel, autophagy, and host-shutoff terms that belong to different mature products |
| Retroviruses and pararetroviruses | HIV-1/HIV-2 and related retroviral Pol/Gag-Pol samples | Reverse transcriptase, DNA integration, and integrated proviral latency are expected on Pol/Gag-Pol, but the same "proviral latency" term should not be automatically transferred to temperate phage integrases |
| Caudoviricetes and filamentous/RNA phage | T4-like phages, lambda-like integrases, filamentous pilus-attachment proteins | Phage-specific structural terms, tail assembly, genome ejection, pilus attachment, and restriction-modification evasion are good SPKW contributions. The recurring failures remain eukaryotic immune terms, defense-response inversion, and broad cell-wall/cell-killing processes |
| Cyanophages and algal viruses | Viral PsbA/PsbD/Psa proteins and PBCV-like algal-virus enzymes | Photosynthesis and chlorophyll binding are not automatically overannotations for viral genes. They look largely valid for photosystem proteins, but non-photosystem metabolic enzymes still need specificity review |
High-Priority Virus-Wide Term Batches
| Priority | Term | Candidates | Rationale |
|---|---|---|---|
| 1 | GO:0052170 symbiont-mediated suppression of host innate immune response | 1,553 | Critical clade split: wrong for many phages, plausible but broad for eukaryotic viruses |
| 2 | GO:0042742 defense response to bacterium | 1,489 | Likely systematic semantic inversion for phage proteins |
| 3 | GO:0031640 killing of cells of another organism | 1,845 | Distinguish true toxins/cytolysins from viral release and normal lysis-cycle proteins |
| 4 | GO:0006260 DNA replication | 1,788 | Often too broad; compare with GO:0039693 viral DNA genome replication |
| 5 | GO:0044423 virion component | 11,664 | High-volume broad CC term; prioritize well-studied viruses where specific components are known |
| 6 | GO:0044826 viral genome integration into host DNA / GO:0075713 establishment of integrated proviral latency | 821 each | Correct for integrating viruses; high-risk for episomal latency viruses |
| 7 | GO:0046677 response to antibiotic / GO:0031427 response to methotrexate | 5 / 1 | Low-volume but clear drug-target-vs-response error in phage DHFR |
| 8 | GO:0044071 / GO:0039645 host cell-cycle perturbation terms | 442 / 119 | Valid for some viral oncogenes and immediate-early proteins; risky on structural, mRNA-export, and replication accessory proteins |
| 9 | GO:0039520 symbiont-mediated activation of host autophagy | 89 | Small but high-risk: 77/91 raw rows are polyprotein or replicase-like records |
| 10 | GO:0071555 cell wall organization | 173 | Mostly phage cell-wall hydrolases; replace with peptidoglycan catabolism or specific enzymatic activity |
| 11 | GO:0004812 aminoacyl-tRNA ligase activity | 39 | Split true viral aminoacyl-tRNA synthetases from synthetase modifier proteins |
| 12 | GO:0006811 / GO:0034220 ion transport terms | 151 / 166 | Correct for M2/Kcv-like channels, questionable on polyproteins and non-channel structural proteins |
Query Used
-- Database: ~/repos/go-db/db/virus.ddb
-- Naive SPKW-unique viral annotations: no non-SPKW evidence for the same gene-term pair.
WITH spkw AS (
SELECT db_object_id, db_object_symbol, db_object_taxon, ontology_class_ref, aspect
FROM gaf_association
WHERE supporting_references = 'GO_REF:0000043'
),
other AS (
SELECT DISTINCT db_object_id, ontology_class_ref
FROM gaf_association
WHERE supporting_references != 'GO_REF:0000043'
),
naive AS (
SELECT s.*
FROM spkw s
WHERE NOT EXISTS (
SELECT 1
FROM other o
WHERE o.db_object_id = s.db_object_id
AND o.ontology_class_ref = s.ontology_class_ref
)
)
SELECT
n.ontology_class_ref,
tl.label,
n.aspect,
COUNT(*) AS annotations,
COUNT(DISTINCT n.db_object_id) AS proteins,
COUNT(DISTINCT n.db_object_taxon) AS taxa
FROM naive n
LEFT JOIN term_label tl ON n.ontology_class_ref = tl.id
GROUP BY n.ontology_class_ref, tl.label, n.aspect
ORDER BY annotations DESC;
Closure-filtered SPKW-unique annotations were also computed using isa_partof_closure, matching the method in SPKW-METHODOLOGY.md. For viruses, both views are useful:
- Naive gives the practical review queue, especially for poorly curated viral taxa.
- Closure-filtered reduces redundant broad annotations and is better for prioritizing well-studied viruses with substantial non-SPKW evidence.
Curation Recommendations for Viral SPKW
-
Always check host biology. Phage-bacterium interactions should not inherit eukaryotic immune terms. Use restriction-modification or CRISPR-Cas terms when that is the actual mechanism.
-
Do not blanket-remove immune evasion terms from eukaryotic viruses. Poxviruses, herpesviruses, flaviviruses, and influenza viruses genuinely suppress host innate immune pathways. The right action is often MODIFY to a more specific host-pathway term.
-
Treat broad enzymatic SPKW terms as non-core unless they are the best available term. Viral polymerases, methyltransferases, proteases, and nucleases should usually get specific MF terms.
-
Separate viral life-cycle terms from generic cellular terms. Prefer viral DNA genome replication over generic DNA replication when reviewing viral DNA polymerases and replication proteins.
-
Review component terms at the correct structural level. "Virion component" is often true but rarely sufficient. Use capsid, envelope, virion membrane, tail, nucleocapsid, or other specific CC terms where supported.
-
Handle polyproteins explicitly. For RNA virus polyproteins, record which mature product supports each function and avoid treating all terms as functions of the uncleaved precursor.
-
Use ViralZone-primary keyword status as a confidence flag, not evidence by itself. A
VZ-primarySPKW row is less likely to be one of the generic keyword artifacts, but it still needs protein-level review. Non-VZ broad parent terms should be prioritized for cleanup.
Next Review Batches
| Batch | Scope | Reason |
|---|---|---|
| Phage immune/defense cleanup | GO:0052170, GO:0042742, GO:0031640 in Caudoviricetes | Existing reviews show systematic semantic mismatch |
| Poxvirus immune-evasion specificity | Vaccinia, monkeypox, variola candidates | High volume; likely legitimate but too broad |
| Herpesvirus latency/integration audit | GO:0044826 and GO:0075713 | Check integration vs episomal latency |
| Giant-virus specificity pilot | Mimiviridae/Tupanvirus broad MF and CC terms | Largest candidate burden in virus.ddb |
| RNA-virus polyprotein pilot | ZIKV and influenza | Tests mature-product granularity and host-defense specificity |
Relationship to Existing SPKW Projects
| Existing project | Viral lesson carried forward |
|---|---|
| SPKW-BPT4.md | Phage host-defense terms need bacterial-host-specific replacements |
| SPKW-PSEPK.md | Reverse transcriptase/defense keywords require system-level context |
| SPKW-ECO57.md | Distinguish true toxins from effectors and normal host manipulation |
| SPKW-ARATH.md | Family/clade divergence can invalidate keyword-level annotations |
| SPKW-METHODOLOGY.md | Use both naive and closure-filtered SPKW-unique queries depending on curation depth |