Skip to content

Fix data integrity: ensure FINISHED status only after successful score upload#2424

Open
hanane-ca wants to merge 1 commit into
codalab:developfrom
hanane-ca:fix/m8-finished-must-have-scores
Open

Fix data integrity: ensure FINISHED status only after successful score upload#2424
hanane-ca wants to merge 1 commit into
codalab:developfrom
hanane-ca:fix/m8-finished-must-have-scores

Conversation

@hanane-ca

@hanane-ca hanane-ca commented Jun 17, 2026

Copy link
Copy Markdown

Reviewers

@codalab/maintainers

Description

Fixes a data integrity bug where submissions were marked as FINISHED before their scores were successfully uploaded to the server, resulting in FINISHED submissions with no scores.

Problem: The compute worker set submission status to FINISHED before uploading scores to the Django server. If the score upload failed (network error, server timeout, etc.), the submission would be marked FINISHED but have no scores in the database.

Root cause:

# Old code (compute_worker.py)
run._update_status(SubmissionStatus.FINISHED)  # Mark FINISHED first
run.push_scores()  # Upload scores second - might fail!

Solution:

  1. Reorder operations: Upload scores BEFORE setting FINISHED status
  2. Add retry logic: Exponential backoff (3 retries) for transient failures
  3. Preserve atomicity: Status update only happens after successful upload

Code changes:

  • compute_worker/compute_worker.py:
    • Moved push_scores() before _update_status(FINISHED) in run_wrapper()
    • Added retry logic with exponential backoff in push_scores()
    • Added 30s timeout for score POST requests
# New code
if run.is_scoring:
    run.push_scores()  # Upload scores first with retry logic
run.push_output()
if run.is_scoring:
    run._update_status(SubmissionStatus.FINISHED)  # Mark FINISHED only after success

Issues this PR resolves

Fixes #2423

Background

This bug was discovered during the EEG Foundation Challenge incident analysis (8,328 submissions, 51% failure rate). Analysis showed submissions stuck in FINISHED state with no scores in the database.

Checklist for hand testing

  • Create a competition with at least one scoring phase
  • Submit a valid submission
  • Verify submission reaches FINISHED status
  • Verify scores are present in the database and displayed on leaderboard
  • Test with network interruptions to verify retry logic

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCI tests are passing
  • Ready to merge

@Didayolo Didayolo self-requested a review June 18, 2026 09:28
@Didayolo Didayolo self-assigned this Jun 18, 2026
@hanane-ca hanane-ca force-pushed the fix/m8-finished-must-have-scores branch from 487fb00 to 24b3ce0 Compare June 19, 2026 08:08
…e upload

The compute_worker must upload scores before marking submission as FINISHED.
Previously, status was updated first, creating a race condition where the
leaderboard could read FINISHED submissions without scores.

Changes:
- Reorder run_wrapper: call push_scores() and push_output() before _update_status(FINISHED)
- Add retry logic in push_scores() with exponential backoff (3 attempts)
- Increase timeout to 30s for score uploads
@hanane-ca hanane-ca force-pushed the fix/m8-finished-must-have-scores branch from 24b3ce0 to 1eec178 Compare June 22, 2026 09:12
@hanane-ca hanane-ca changed the title Fix M8: ensure FINISHED status only after successful score upload Fix data integrity: ensure FINISHED status only after successful score upload Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data integrity: score or leaderboard write fails after Finished status

3 participants