Skip to content

fix(compute_worker): make submission lifecycle idempotent against bro…#2434

Draft
AybH26 wants to merge 1 commit into
codalab:developfrom
AybH26:fix/compute-worker-redelivery-idempotency
Draft

fix(compute_worker): make submission lifecycle idempotent against bro…#2434
AybH26 wants to merge 1 commit into
codalab:developfrom
AybH26:fix/compute-worker-redelivery-idempotency

Conversation

@AybH26

@AybH26 AybH26 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@ mention of reviewers

A brief description of the purpose of the changes contained in this PR

compute_worker uses task_acks_late = True, so any worker that dies before acking causes RabbitMQ to redeliver the run payload. The redelivered worker used to re-execute the full lifecycle, causing:

  1. scoring_worker_hostname overwritten by the second worker
  2. Submission status flipped backwards through Running → Scoring → Finished again
  3. upload_submission_scores inserted duplicate SubmissionScore rows, crashing calculate_scores() with MultipleObjectsReturned

This PR makes the lifecycle idempotent on both sides (worker short-circuits + server refuses terminal mutations). Defense-in-depth, Stripe-style.

Issues this PR resolves

Closes #2433

A checklist for hand testing

  • Create a submission, let it finish scoring
  • Manually call POST /api/submissions/<id>/upload_submission_scores/ with new hostname & scores
  • Expected: hostname & scores remain unchanged; worker_attempt_count increments; status stays terminal
  • Verify no MultipleObjectsReturned exception in logs
  • Check admin shell: Submission.objects.get(pk=26).worker_attempt_count >= 2
  • Confirm no duplicate score rows for same column

Any relevant files for testing

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCI tests are passing
  • Ready to merge

@AybH26 AybH26 force-pushed the fix/compute-worker-redelivery-idempotency branch from 89ed741 to 42d9950 Compare June 22, 2026 08:38
@AybH26 AybH26 marked this pull request as draft June 22, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compute_worker re-executes submissions on broker redelivery, causing duplicate scores, hostname overwrite, and status flipping

1 participant