feat(huggingFace): add image task family via ImageTaskCodegen by PG1204 · Pull Request #5320 · apache/texera

PG1204 · 2026-06-03T01:22:14Z

⚠️ This PR is stacked on #5278. Until that lands, the diff below also includes #5278's operator + codegen + spec changes. The new code in this PR is codegen/ImageTaskCodegen.scala, the image-related additions to codegen/PythonCodegenBase.scala, the new image fields on HuggingFaceInferenceOpDesc.scala, the frontend image-upload component, and the image-task tests in HuggingFaceInferenceOpDescSpec.scala. Once #5278 merges, this diff will auto-clean to ~856 lines.

What changes were proposed in this PR?

Adds the image task family — 9 HF pipeline tasks — as the second TaskCodegen plugged into the dispatcher established by #5278:

image-only: image-classification, object-detection, image-segmentation, image-to-text
image + prompt: visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image

codegen/ImageTaskCodegen.scala supplies the per-task payload + parse Python branches for all 9 tasks.
TaskCodegen trait gains a tasks: Set[String] default method (defaults to Set(task)) so a single codegen can register under multiple task strings; ImageTaskCodegen is the first multi-task codegen to use it.
CodegenContext extended with imageInput + inputImageColumn (EncodableString).
HuggingFaceInferenceOpDesc.scala gains 2 new @JsonProperty fields and registers ImageTaskCodegen via the new tasks flat-map.

PythonCodegenBase.scala grows to host the shared image infrastructure:

Task-family tuples (image_only_tasks, image_prompt_tasks, image_tasks) + image_headers in process_table.
Per-row image-bytes resolution from upload or column with _read_image_input / _read_binary_value / _compress_image_bytes.
_post_with_fallback extended with raw_binary_headers + use_raw_binary_body; adds image-text-to-text chat-completions and model-author vision branches.
_call_provider gains zai-org, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll branches, and image embedding for OpenAI-compatible / unknown-provider fallbacks.
Image content-type response handling returns data:image/...;base64,... URLs.
Image helpers added: _read_image_input, _compress_image_bytes, _image_input_as_base64, _read_binary_value, _looks_like_html, _html_to_image_bytes, _extract_json_arg, _url_to_data_url.

Frontend integration (HF lines only — no agent / dataset noise):
HuggingFaceImageUploadComponent declared in app.module.ts, huggingface-image-upload formly type registered, image upload component .ts/.html/.scss + HuggingFace.png + sample-image.png assets.

User-input strings continue to flow through pyb"..." + EncodableString so they reach Python as self.decode_python_template('<base64>') rather than raw literals. PythonCodeRawInvalidTextSpec still passes
(117/117 descriptors py_compile cleanly).

Any related issues, documentation, or discussions?

Tracking issue: Add image task family (ImageTaskCodegen) to HuggingFace operator #5319
Closes: Add image task family (ImageTaskCodegen) to HuggingFace operator #5319
Stacked on: feat(huggingFace): refactor operator into per-task codegen + text-generation #5278 (operator + text-generation — issue Add HuggingFaceInferenceOpDesc with dispatcher + per-task codegen architecture (text-generation) #5277)
Parent issue: Add Hugging Face inference operator #5041
Closed sibling issue: Add HuggingFaceModelResource REST endpoints for HF operator UI #5134 (REST resource — landed via feat(huggingFace): add HuggingFaceModelResource for model browsing and media proxy #5124)

How was this PR tested?

sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile" clean.
sbt scalafmtCheck clean.
sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec" — 18/18 pass (PR 2's 13 spec tests + 5 new image-task tests: image-only routing, VQA / document-QA payload, image-text-to-text chat-completions, image-to-image data-URL parse, all-9-tasks dispatcher coverage).
sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec" — 117/117 descriptors py_compile cleanly with the new operator code paths, no marker leaks.
Generated Python verified via python3 -m py_compile on sample image-task outputs.

Was this PR authored or co-authored using generative AI tooling?

Yes, co-authored with Claude Opus 4.7.

…d media proxy Introduces a new Jersey REST resource exposing endpoints used by the upcoming HuggingFace operator UI: - GET /api/huggingface/models — browse / search models per task - GET /api/huggingface/tasks — list HF pipeline tags with hosted inference - POST /api/huggingface/upload-audio — upload audio for HF audio tasks - GET /api/huggingface/audio-preview — stream uploaded audio (path-validated) - GET /api/huggingface/media-proxy — proxy remote media URLs to bypass CORS This is the first PR in a stacked series landing the HF operator end-to-end. No operator code yet; this resource is independently useful and lets the frontend integrate with HF before the operator class lands.

Addresses xuang7's review on PR apache#5124 — both endpoints previously buffered the full payload into a heap-resident byte[] with no upper bound, leaving the JVM open to OOM on a hostile or buggy upstream response (/media-proxy) or out-of-band write into the audio temp dir (/audio-preview). - /media-proxy: switch from Unirest.asBytes() to asObject(Function<RawResponse, T>), streaming the upstream body in 8 KiB chunks with a running byte counter. Aborts with 413 if the declared Content-Length exceeds the cap (pre-check) or if the body crosses the cap mid-read (defends against missing/lying Content-Length). New MAX_MEDIA_PROXY_BYTES = 50 MiB, sized for HF inference media (text-to-image ~5 MiB, text-to-video ~30 MiB) with headroom. - /audio-preview: add Files.size() defense-in-depth check before readAllBytes. /upload-audio already enforces MAX_AUDIO_BYTES on ingest; this catches the case where a bug or out-of-band write puts an oversized file in the temp dir. Adds a spec covering the audio-preview cap using a sparse-file fixture so the test stays fast (87/87 spec passes). The media-proxy cap path is exercised via the existing input-validation suite plus the new streamMediaWithCap helper - a follow-up can add a fake-RawResponse unit test if reviewers want explicit coverage of the chunked-read cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@RolesAllowed

Per review on apache#5124 (xuang7, Ma77Ball): mark the resource with @RolesAllowed(Array("REGULAR", "ADMIN")) to document that all five endpoints require an authenticated user. The annotation isn't enforced yet — that's coming with the auth-enforcement PR @Yicong-Huang and @Ma77Ball are working on — but adding it now means no follow-up change is needed when enforcement lands, and it matches the convention used by UserConfigResource / AdminSettingsResource. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@JsonProperty

…eration Splits the monolithic 1,278-line HuggingFaceInferenceOpDesc from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation) end-to-end. - TaskCodegen trait + CodegenContext model the per-task variation - PythonCodegenBase emits the shared provider-fallback / process_table / _parse_response infrastructure with two holes for the per-task payload and parse snippets - TextGenCodegen supplies text-generation's chat-completions payload and the body["choices"][0]["message"]["content"] parse branch - HuggingFaceInferenceOpDesc becomes a thin dispatcher (~180 lines) holding @JsonProperty fields and the registeredCodegens map User-input string fields are typed as EncodableString and emitted via the pyb"..." macro so values reach Python as self.decode_python_template('<base64>') rather than raw literals; class constants are assigned in open(self) so self is in scope for the decode call. Generated process_table runs a defensive _HF_MODEL_ID_PATTERN check at runtime before any HF URL is composed. PR 2 of a stacked 9-PR series. PR 1 (apache#5124) ships the supporting REST resource; PRs 3-5 will add image, audio + media-gen, and QA/ranking task families by registering new *Codegen objects in the dispatcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@JsonProperty

…degen specs Addresses Codecov's 66.85% patch coverage warning by exercising the defensive null-handling branches in HuggingFaceInferenceOpDesc.scala and the TextGenCodegen contract that previously had no spec hits. - null-tolerance: feed null into every @JsonProperty (token, model, prompt col, system prompt, result col, task, maxNewTokens, temperature) and assert generatePythonCode still emits a parseable ProcessTableOperator with sane defaults (TASK falls back to text-generation, MAX_NEW_TOKENS clamps to 256, TEMPERATURE to 0.7). Covers the `if (x == null) ... else x` branches that previously had no test that took the null side. - TextGenCodegen.task: trivial canonical-value check. - TextGenCodegen ctx-independence: pass an "irrelevant"-filled ctx and assert payloadPython / parsePython still reference self.MODEL_ID and body["choices"]…. Catches a future refactor that accidentally splices ctx fields into the static snippets. 13/13 in HuggingFaceInferenceOpDescSpec, 2/2 in PythonCodeRawInvalidTextSpec (117/117 descriptors still py_compile cleanly). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-06-03T01:25:17Z

Codecov Report

❌ Patch coverage is 72.38095% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.98%. Comparing base (e987f13) to head (3975e0a).

Files with missing lines	Patch %	Lines
...mage-upload/hugging-face-image-upload.component.ts	50.00%	41 Missing and 1 partial ⚠️
...ge-upload/hugging-face-image-upload.component.html	38.88%	11 Missing ⚠️
...rator/huggingFace/HuggingFaceInferenceOpDesc.scala	94.00%	0 Missing and 3 partials ⚠️
...ber/operator/huggingFace/codegen/TaskCodegen.scala	84.61%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5320      +/-   ##
============================================
+ Coverage     51.88%   51.98%   +0.10%     
- Complexity     2472     2500      +28     
============================================
  Files          1067     1074       +7     
  Lines         41258    41468     +210     
  Branches       4437     4460      +23     
============================================
+ Hits          21408    21559     +151     
- Misses        18591    18645      +54     
- Partials       1259     1264       +5

Flag	Coverage Δ		*Carryforward flag
access-control-service	`42.22% <ø> (ø)`
agent-service	`33.76% <ø> (ø)`		Carriedforward from 5f54275
amber	`53.15% <95.32%> (+0.27%)`	⬆️
computing-unit-managing-service	`1.65% <ø> (ø)`
config-service	`56.06% <ø> (ø)`
file-service	`38.32% <ø> (ø)`
frontend	`46.39% <48.54%> (+0.01%)`	⬆️
pyamber	`90.69% <ø> (ø)`		Carriedforward from 5f54275
python	`90.83% <ø> (ø)`		Carriedforward from 5f54275
workflow-compiling-service	`58.69% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PG1204 · 2026-06-03T02:37:41Z

/request-review @Ma77Ball

Ma77Ball

Please look at the suggestions below.

…NAI_COMPATIBLE_PROVIDERS to class constants

@JsonProperty

Plugs the 9-task image family into the dispatcher pattern established in PR 2: image-only image-classification, object-detection, image-segmentation, image-to-text image + prompt visual-question-answering, document-question-answering, zero-shot-image-classification, image-text-to-text, image-to-image - ImageTaskCodegen supplies payload + parse Python for all 9 tasks - TaskCodegen trait gains a `tasks: Set[String]` default method so a single codegen can register under multiple task strings; the dispatcher map in HuggingFaceInferenceOpDesc is built from registeredCodegens.tasks.flatMap(...) - CodegenContext extended with imageInput + inputImageColumn (EncodableString) - HuggingFaceInferenceOpDesc gains 2 new @JsonProperty fields and registers ImageTaskCodegen PythonCodegenBase grows to host the shared image infrastructure: - image_only_tasks / image_prompt_tasks / image_tasks tuples and image_headers in process_table - per-row image bytes resolution from upload (self._read_image_input) or input column (self._read_binary_value + self._compress_image_bytes) - use_raw_binary_body / raw_binary_headers state threaded through _post_with_fallback (signature extended) - _post_with_fallback adds the image-text-to-text chat-completions branch and the model-author vision branch - _call_provider adds branches for zai-org's custom API, Replicate predictions + polling, Fal-ai, Wavespeed submit+poll, and image embedding in OpenAI-compatible / unknown-provider fallbacks - image-content-type response handling returns data:image URLs - image helpers added: _read_image_input, _compress_image_bytes, _image_input_as_base64, _read_binary_value, _looks_like_html, _html_to_image_bytes, _extract_json_arg, _url_to_data_url User-input strings continue to flow through pyb"..." + EncodableString so they reach Python as self.decode_python_template('<base64>') rather than raw literals. PythonCodeRawInvalidTextSpec still passes (117/117 descriptors py_compile cleanly). Frontend integration adds only the HF lines (no agent / dataset noise from the source branch): - HuggingFaceImageUploadComponent declared in app.module.ts - huggingface-image-upload formly type registered in formly-config.ts - Image upload component .ts/.html/.scss cherry-picked from huggingFace - HuggingFace.png + sample-image.png assets PR 3 of a stacked 9-PR series. Stacks on hf/02-operator-textgen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oad template

…ap comment

PG1204 and others added 15 commits May 17, 2026 13:02

fix: address review feedback on HuggingFaceModelResource

935ccc1

Merge branch 'apache:main' into hf/01-backend-skeleton

089c3c4

Merge branch 'apache:main' into hf/01-backend-skeleton

2aa865c

Merge branch 'apache:main' into hf/01-backend-skeleton

0c30beb

chore: retrigger CI

6857e34

Merge branch 'apache:main' into hf/01-backend-skeleton

6f0f5fb

Merge branch 'main' into hf/01-backend-skeleton

fec6dfb

Merge branch 'apache:main' into hf/01-backend-skeleton

5e95bcd

fix: scala lint fixes

8350eb9

Merge branch 'apache:main' into hf/02-operator-textgen

2efa337

github-actions Bot assigned PG1204 Jun 3, 2026

github-actions Bot added frontend Changes related to the frontend GUI common labels Jun 3, 2026

PG1204 mentioned this pull request Jun 3, 2026

Add Hugging Face inference operator #5041

Open

Merge branch 'apache:main' into hf/02-operator-textgen

c44d7d0

Ma77Ball suggested changes Jun 4, 2026

View reviewed changes

PG1204 and others added 5 commits June 5, 2026 12:39

refactor(huggingFace): cap HTTP error detail + lift CHAT_ROUTES / OPE…

28fcab0

…NAI_COMPATIBLE_PROVIDERS to class constants

style: apply scalafmt and prettier to HF inference spec and image upl…

2b46a9c

…oad template

chore: add Apache license header to HF image upload template and styles

0815d14

test(frontend): cover HuggingFaceImageUploadComponent

76f606a

PG1204 force-pushed the hf/03-image-tasks branch from 8187ac1 to 76f606a Compare June 5, 2026 20:15

Merge branch 'apache:main' into hf/03-image-tasks

ea3ea63

PG1204 and others added 2 commits June 5, 2026 13:56

fix(huggingFace): zero-shot labels, polling progress logs, data-URL c…

ef59a1e

…ap comment

Merge branch 'apache:main' into hf/03-image-tasks

3975e0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(huggingFace): add image task family via ImageTaskCodegen#5320

feat(huggingFace): add image task family via ImageTaskCodegen#5320
PG1204 wants to merge 24 commits into
apache:mainfrom
ELin2025:hf/03-image-tasks

PG1204 commented Jun 3, 2026

Uh oh!

codecov-commenter commented Jun 3, 2026 •

edited

Loading

Uh oh!

PG1204 commented Jun 3, 2026

Uh oh!

Ma77Ball left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

PG1204 commented Jun 3, 2026

What changes were proposed in this PR?

Any related issues, documentation, or discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PG1204 commented Jun 3, 2026

Uh oh!

Ma77Ball left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Jun 3, 2026 •

edited

Loading

Ma77Ball left a comment •

edited

Loading