Skip to content

feat(job): expose locate job definition and add locate qualification examples#617

Merged
Karl-The-Man merged 4 commits into
feat(job)/expose-locate-job-definitionfrom
feat(job)/locate-add-example-method
Jun 12, 2026
Merged

feat(job): expose locate job definition and add locate qualification examples#617
Karl-The-Man merged 4 commits into
feat(job)/expose-locate-job-definitionfrom
feat(job)/locate-add-example-method

Conversation

@RapidPoseidon

Copy link
Copy Markdown
Contributor

What

Exposes the locate rapid type as a first-class public API in the SDK and adds support for locate qualification examples on custom audiences.

feat(job) — expose locate job definition

  • Renames _create_locate_job_definitioncreate_locate_job_definition (now public), matching the existing public create_compare_job_definition.
  • Lists locate on the docs landing page and in the parameter reference.

feat(audience) — locate qualification examples

  • Adds add_locate_example to AudienceExampleHandler and RapidataAudience, mirroring the existing add_classification_example / add_compare_example methods.
  • Adds Box.to_example_model() and extracts the box-coverage sweep-line into a module-level calculate_boxes_coverage (shared by RapidsManager and the new audience example, so randomCorrectProbability is computed consistently with the locate validation rapid).

docs

  • New examples/locate_job.md (simple + advanced/custom-audience paths), nav + search-plugin wiring.
  • Switches the curated-audience snippets from find_audiences("alignment")[0] to get_audience_by_id(...) for reproducibility, and points aud_* example jobs at curated/custom audiences so they produce results.

Testing

  • Examples were run end-to-end by the author and produce results.
  • All changed modules py_compile clean; the coverage extraction is behaviour-preserving (existing RapidsManager callers unchanged).

Notes

  • This promotes locate to the public job-definition surface alongside compare; other types (classify, ranking, etc.) remain private — intentional for this change.
  • examples/locate_job.md references a real curated audience id (aud_MU1GZYoESyO) for a runnable example.

🔗 Session: https://session-eb4cb58b.poseidon.rapidata.internal/

RapidPoseidon and others added 3 commits June 11, 2026 14:08
…nces

Lets users train a custom audience for locate jobs, mirroring the existing
add_classification_example / add_compare_example. Locate truths are bounding
boxes, so the method takes a list[Box]; the generic example endpoint and the
LocateExamplePayload / LocateExampleTruth models already exist, so no backend
change is needed.

- Box.to_example_model() converts to ExampleBoxShape (0-100 scale).
- Extract calculate_boxes_coverage() into box.py (shared with RapidsManager)
  and use it as the example's randomCorrectProbability.
- Add a Simple/Advanced tab layout to the locate docs example.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: karl <karl@rapidata.ai>
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR promotes create_locate_job_definition to the public API, adds add_locate_example to the audience surface, extracts calculate_boxes_coverage into a shared module-level function, and expands the documentation with Simple/Advanced tabbed examples for classify, compare, and locate jobs. The structure follows existing patterns cleanly.


Code Quality

box.pycalculate_boxes_coverage extraction

The refactoring is clean and behavior-preserving. The new version also drops the unused box variable that was being packed into the event tuple in the original ((x, type, id, box)(x, type, id)), which is a nice cleanup.

One note on the sort ordering: events.sort(key=lambda e: (e[0], e[1] == "end")) processes "start" before "end" at the same x-coordinate (since False < True). This means adjacent boxes that share an edge are counted as covered at that boundary — a deliberate and correct choice, but worth a short comment since it's a subtle invariant.

Box.to_example_model() duplicates to_model()

Both methods perform the same * 100 coordinate scaling but return different types (ExampleBoxShape vs LocateBoxTruthModelBox). Minor duplication — consider a private helper if a third consumer ever appears.

def _as_percentages(self):
    return self.x_min * 100, self.y_min * 100, self.x_max * 100, self.y_max * 100

Not blocking, just noting it.

Import convention violation in audience_example_handler.py

Per CLAUDE.md: "use from __future__ import annotations and TYPE_CHECKING to check the types."

The file already has from __future__ import annotations and a TYPE_CHECKING block. Box is only used in the type annotation truths: list[Box] — it is never used at runtime (no isinstance, no construction). It should be moved under TYPE_CHECKING:

# Move this:
from rapidata.rapidata_client.validation.rapids.box import (
    Box,
    calculate_boxes_coverage,
)

# To this:
from rapidata.rapidata_client.validation.rapids.box import calculate_boxes_coverage

if TYPE_CHECKING:
    from rapidata.rapidata_client.validation.rapids.box import Box

calculate_boxes_coverage stays as a runtime import because it is called at runtime.

RapidsManager._calculate_boxes_coverage thin wrapper

The method now just delegates to the module-level function. It could be inlined at its two call sites (calculate_boxes_coverage(boxes)) to remove the wrapper entirely, but leaving it is not harmful.


Potential Issues

No unit tests for calculate_boxes_coverage

The function is algorithmically non-trivial (sweep-line area union). The PR notes end-to-end testing, but no unit tests were added. Consider adding a few cases:

  • Single box: [Box(0.0, 0.0, 0.5, 0.5)]0.25
  • Two non-overlapping boxes → sum of areas
  • Two fully overlapping boxes → area of the union
  • Two boxes sharing an edge → correct boundary handling
  • Empty list → 0.0

Hardcoded audience IDs in docs

aud_MU1GZYoESyO (Alignment) and aud_mr3NbeWa4Uo (Coherence) are hardcoded in quickstart, starting_page, classify_job, compare_job, and locate_job examples. This is acknowledged and pragmatic, but if either ID is retired or replaced, multiple doc pages silently break. A single source of truth (e.g., a docs variable or a note in the contributing guide listing these IDs) would make future maintenance easier.

locate_job.md Simple tab uses "global" audience ID

The locate example uses get_audience_by_id("global"), while the other examples use real UUID-style IDs. It would be worth confirming the backend treats "global" as a reserved/stable identifier, otherwise this will silently fail.


Documentation

The Simple/Advanced tab pattern is consistent and well-executed. The callout notes (especially the warning block about audience training time) are clear without being verbose. The parameter availability matrix update for Locate is accurate based on the implementation.

Minor nit: in locate_job.md the Simple tab describes the audience as "generally available" while compare/classify call it "curated." The terminology should be consistent across all three pages.


Summary

Import convention violation Box should be under TYPE_CHECKING in audience_example_handler.py
Missing tests calculate_boxes_coverage has no unit tests despite being algorithmically non-trivial
Minor duplication to_model / to_example_model share identical coordinate logic
Doc terminology "generally available" vs "curated" inconsistency across example pages
Hardcoded IDs Multiple doc pages reference specific audience UUIDs with no central record

The core logic and API shape are solid. The import convention and test coverage are the two things worth addressing before merge.

@Karl-The-Man Karl-The-Man requested a review from LinoGiger June 12, 2026 08:36
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — feat(job): expose locate job definition and add locate qualification examples

Overview

This PR promotes create_locate_job_definition to the public API, adds add_locate_example to the audience qualification pipeline, and refactors the box-coverage sweep-line into a shared module-level function. Docs are expanded with Simple/Advanced tabs across locate, classify, and compare examples. The change is well-scoped and consistently follows existing patterns.


Strengths

  • Pattern consistency: add_locate_example in both AudienceExampleHandler and RapidataAudience mirrors the existing add_classification_example/add_compare_example methods — same tracing, same logging, same _try_start_recruiting call, same method chaining.
  • Refactoring is behaviour-preserving: calculate_boxes_coverage extracted to module-level is logically equivalent to the old _calculate_boxes_coverage. The new code is cleaner (drops the unused box from the event tuple, uses sorted() instead of build-then-sort).
  • TYPE_CHECKING convention honoured in rapidata_audience.pyBox placed under TYPE_CHECKING with from __future__ import annotations at the top, as CLAUDE.md requires.
  • Input validation — the if not truths: guard is good; an empty box list would silently produce randomCorrectProbability=0.0, making every example impossible to pass.

Issues

1. box.py missing from __future__ import annotations (CLAUDE.md violation)

CLAUDE.md requires this import in every file that uses type annotations. box.py has no from __future__ import annotations despite the new calculate_boxes_coverage function having type annotations. This is the only file in the diff that violates the rule.

# box.py — add at top
from __future__ import annotations

2. Box imported at module level in audience_example_handler.py instead of under TYPE_CHECKING

Box appears at module level while the project convention puts types under TYPE_CHECKING. Since calculate_boxes_coverage is also imported from the same module and genuinely needed at runtime, having both at module level is understandable — but separating them would be more consistent:

from rapidata.rapidata_client.validation.rapids.box import calculate_boxes_coverage

if TYPE_CHECKING:
    from rapidata.rapidata_client.validation.rapids.box import Box

This mirrors how Box is treated in rapidata_audience.py.

3. Dead-code wrapper in RapidsManager._calculate_boxes_coverage

The private method now just delegates to the module-level function. The two internal call-sites (lines ~279 and ~342) could call calculate_boxes_coverage directly and the wrapper could be removed entirely. Not harmful to leave, but it is unnecessary indirection.

4. Magic string "global" in locate_job.md

audience = client.audience.get_audience_by_id("global")

The id "global" is a special sentinel that looks like an ordinary audience id to a reader. The callout note says "already has labelers ready to work" but does not explain why "global" is special. A brief note would prevent confusion — for example:

global is a reserved audience id that routes to all available labelers — unlike regular audience ids, it does not need to be looked up from the Dashboard.


Minor Nit

The create_locate_job_definition docstring has a stray literal \n embedded mid-sentence (copy-pasted from classify/compare methods). Not a blocker, but renders oddly in IDEs.


Summary

Core logic and structure are solid. Two actionable items before merging:

  1. Add from __future__ import annotations to box.py (CLAUDE.md requirement).
  2. Clarify the "global" audience id in the locate_job.md callout.

The Box import placement and the dead-code wrapper are style preferences worth addressing but not blocking.

@Karl-The-Man Karl-The-Man changed the base branch from main to feat(job)/expose-locate-job-definition June 12, 2026 10:09
@Karl-The-Man Karl-The-Man merged commit cac00ba into feat(job)/expose-locate-job-definition Jun 12, 2026
3 checks passed
@Karl-The-Man Karl-The-Man deleted the feat(job)/locate-add-example-method branch June 12, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants