Skip to content

chore: Backfill new annotation tables, cut over reads, and drop frozen old tables #775

Description

@bencap

Context

Depends on: #742 (annotation pipeline migrated to the Allele model), #747 (existing MappedVariant data backfilled to MappingRecord + Allele)

Epic #742 moved the annotation jobs onto the new Allele-based parallel-tables model behind an explicit frozen old tables invariant: new score sets write only to the new tables, while the old tables are read by serving for existing data and never written for new data.

That invariant was always meant to be temporary. Once the new tables exist (all 5 steps of #742) and existing data has been migrated into them (#747), the old tables are dead weight: they confuse the data model, double the surface a reader has to reason about, and keep the serving layer reading from a representation we no longer write. This issue is the deliberate teardown.

Note: #747's Phase 1 describes "repoint the gnomad_variants/clinical_controls M2M associations to allele_id." The design diverged after #747 was written — the new model uses dedicated ValidTime link tables (gnomad_allele_links, clinvar_allele_links, vep_allele_consequences), not repointed M2M associations. The backfill must populate those link tables; this cleanup then drops the old associations. Worth reconciling #747's wording when it's picked up.

Goal

After backfill, retire the frozen old annotation tables/columns and cut the read path over to the new tables, so the new Allele-based representation is the single source of truth.

Scope

1. Backfill the new annotation link tables from old data

One-time, re-runnable migration populating the new tables from existing MappedVariant-era data (depends on #747 having created the Allele/MappingRecord rows to link to):

  • gnomad_allele_linksgnomad_variants_mapped_variants
  • clinvar_allele_linksmapped_variants_clinical_controls
  • vep_allele_consequencesMappedVariant.vep_functional_consequence / vep_access_date
  • Verify Allele.hgvs_g/c/p and Allele.clingen_allele_id are populated for migrated alleles (these replace the retired HGVS and variant-translation jobs).

2. Read-cutover

  • Update v_variant_annotations (api/src/mavedb/models/variant_annotation_view.py) to project from Allele + the new link tables instead of MappedVariant.hgvs_g/c/p and the old annotation columns/associations.
  • Audit any other serving queries / endpoints that read the old annotation tables and repoint them.

3. Drop the old tables/columns (after backfill + cutover verified in prod)

  • Table gnomad_variants_mapped_variants + GnomADVariant.mapped_variants relationship
  • Table mapped_variants_clinical_controls + MappedVariant.clinical_controls relationship
  • Columns MappedVariant.vep_functional_consequence, MappedVariant.vep_access_date
  • Table variant_translations + api/src/mavedb/lib/variant_translations.py (confirm no remaining readers; the RT allele-equivalence space replaces it — see feat: Extend annotation pipeline to cover Allele entities #742 Step 5)
  • The original clinical_controls table (renamed to clinvar_controls for new code in feat: Extend annotation pipeline to cover Allele entities #742 Step 3; the old name stays only for serving until cutover)

Acceptance Criteria

  • Backfill migration populates gnomad_allele_links, clinvar_allele_links, and vep_allele_consequences from old data with no data loss; idempotent on re-run.
  • v_variant_annotations and all serving reads resolve annotation data through the new tables; no serving path reads a dropped table.
  • The frozen old tables/columns listed above are dropped in a migration with a tested downgrade().
  • Pipeline tracking / validation (mavedb.scripts.pipeline_tracking) still reports correctly against the new representation.
  • No reference to a dropped table/column/relationship remains in src/ (grep-clean).

Explicitly out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: workerTask implementation touches the workertype: maintenanceMaintaining this projectworkstream: clinicalTask relates to clinical features

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions