feat: PuLID-Flux identity-injection support by RapidMark · Pull Request #1595 · leejet/stable-diffusion.cpp

RapidMark · 2026-06-01T23:30:58Z

Adds PuLID-Flux identity injection to the Flux denoise loop (works on CUDA / Vulkan / HIP / Metal). Given a single source portrait, generated images preserve the source person's face across arbitrary scenes and prompts. Pure-ggml implementation (every op has a backend kernel), so it's cross-vendor by construction.

This PR has three stages, all folded in below: (1) the original feature (#1542), (2) the gguf id-embedding rework (this PR, supersedes #1542), and (3) a rebase onto current master after the #1615 src-layout refactor.

1. The feature (original submission, #1542)

Mirrors the reference ToTheBeginning/PuLID (encoders_transformer.py + flux/model.py) and the PuLID v0.9.1 hook schedule (every 2nd of the double blocks, every 4th of the single blocks).

What's included

src/model/adapter/pulid.hpp — PuLIDPerceiverAttentionCA, the cross-attention module (Q from image tokens, K/V from the ID embedding). Pure-ggml graph; runs on CPU / CUDA / Vulkan / Metal without backend-specific code.
src/model/diffusion/flux.hpp — adds the pulid_ca.<i> child blocks to Flux (constructed conditionally when PuLID weights are present), inserts the cross-attention between transformer blocks at the reference intervals (every 2nd double, every 4th single), and threads the identity embedding + weight through forward / forward_orig / compute / build_graph. skip_layers + PuLID is explicitly refused (would misalign the hook schedule).
src/stable-diffusion.cpp — loads the pulid_ca.* weights via model_loader under the existing model.diffusion_model. prefix so they bind to the new blocks naturally, and loads the id-embedding, wrapping it as a sd::Tensor<float> passed via DiffusionParams.
include/stable-diffusion.h — public API: sd_pulid_params_t (per-generation embedding path + weight), pulid_weights_path on sd_ctx_params_t, pulid_params on sd_img_gen_params_t.
examples/common/common.{cpp,h} — three CLI flags: --pulid-weights, --pulid-id-embedding, --pulid-id-weight.
src/model/diffusion/model.hpp — extends DiffusionParams to carry the embedding + weight; FluxModel::compute forwards both.
docs/pulid.md — usage, embedding format, supported PuLID versions (v0.9.0 / v0.9.1; v1.1 deferred), memory-budget notes, and the three-way SHA-256 falsification recipe.
scripts/pulid_extract_id.py — reference precompute tool that produces the id-embedding from a source portrait.

Why split extraction from injection

PuLID-Flux's identity extractor is a stack of three large PyTorch models (ArcFace + EVA-CLIP-L + IDFormer). Porting all three to C++/ggml would add thousands of lines for code that runs once per source person. By making sd.cpp consume a precomputed embedding, the C++ surface stays small (~600 lines), the heavy ML stack runs once on any PyTorch backend, and PuLID is decoupled from active development on insightface / EVA-CLIP / IDFormer.

Verification

The three-way SHA-256 falsification recipe in docs/pulid.md distinguishes "wired but inert" from "actively altering the trajectory":

Run	Expected hash relation
A: no `--pulid-*` flags	baseline
B: PuLID flags, `--pulid-id-weight 0.0`	byte-identical to A
C: PuLID flags, `--pulid-id-weight 1.0`	differs from A, preserves source identity

Verified on three backends from the same source:

Vulkan-AMD (RX 6700 XT, -DSD_VULKAN=ON): A == B byte-identical, A != C, C preserves identity.
Vulkan-NVIDIA (RTX 3060, same binary, --backend "diffusion=vulkan1"): A == B, A != C, C visually equivalent to the AMD output at the same seed (different bytes per the usual cross-backend nondeterminism).
CUDA-NVIDIA (RTX 3060, separate -DSD_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86 build, CUDA 13.2): A == B byte-identical, A != C, C preserves identity. The PerceiverAttentionCA pure-ggml graph runs unchanged across all three — no backend-specific conditionals.

Measurements

Per-image sampling at 512×512 / 4 steps / Flux Schnell Q4 + PuLID:

Backend	Sampling (s)	Notes
AMD 6700 XT (Vulkan)	22	12 GB consumer card
NVIDIA 3060 (Vulkan)	11	same binary as AMD
NVIDIA 3060 (CUDA)	9.6	separate `-DSD_CUDA=ON` build

batch_count=3 confirms long-lived-process amortization: per-image sampling drops from 19.6 s (cold) to ~11 s (warm) as the model stays resident across iterations. Tested with Flux Schnell Q4_K_S at 512²/4 and Flux Dev Q4_K_S at 768²/20. At 1024² the VAE decode needs a large single compute buffer that can exceed a consumer card's max-allocation limit → use --vae-tiling (or route the VAE to a roomier backend); this is existing sd.cpp behavior, not PuLID-specific, but documented in docs/pulid.md since PuLID users hit it.

Not yet supported (in docs/pulid.md)

PuLID v1.1 (renamed key layout id_adapter_attn_layers.*). Follow-up.
Multiple ID images fused into one embedding (reference pipeline supports it; the precompute tool takes one portrait per run).
The --true-cfg negative branch — PuLID only injects on the positive conditioning path in the reference; this matches.

Backward compatibility

Non-PuLID generations are unaffected: PuLID is only constructed when the loader sees a pulid_ca.* tensor. A regression run without --pulid-* flags is byte-identical to pre-patch.

2. Update — gguf id-embedding (this PR, supersedes #1542)

In the original submission the id-embedding was a bespoke PULIDV01 32-byte-header binary with a hand-rolled parser. Per @Green-Sky's review on #1542, this PR reworks it into a standard GGUF container: a single fp16 tensor pulid_id of shape [2048, 32], loaded via gguf_init_from_file exactly like the pulid_ca.* weights. The custom header and parser are gone — one fewer on-disk format, and the embedding loads through the same path as every other tensor. scripts/pulid_extract_id.py writes the gguf; docs/pulid.md documents the gguf layout.

Opened fresh rather than force-pushing #1542 because that branch had drifted onto an old master and its history was tangled; a clean rebase was easier to review.

3. Follow-up — rebased onto the src/ layout refactor (#1615)

Master landed the #1615 "reorganize src model layout" refactor while this PR was open. This rebase re-homes PuLID onto the new tree, no functional change:

src/pulid.hpp → src/model/adapter/pulid.hpp (includes updated to core/ggml_extend.hpp + model/common/block.hpp).
The Flux injection moved with src/flux.hpp → src/model/diffusion/flux.hpp.
Detection follows the refactor's factory: FluxParams renamed FluxConfig, weight-based config now in FluxConfig::detect_from_weights; PuLID auto-detection (pulid_ca. → pulid_enabled) moved into it.
DiffusionParams plumbing follows src/diffusion_model.hpp → src/model/diffusion/model.hpp; includes updated to the new src/{core,model}/... paths.

The feature, hook schedule, and CLI/API surface are unchanged from stages 1–2.

Validation (rebased branch)

Build: OK — clean Vulkan build (sd-cli + sd-server link).
Identity: OK on RDNA4 (Radeon RX 9070 XT, Vulkan): Flux Krea-Dev Q4 + PuLID v0.9.1 + a freshly extracted gguf id-embedding, 1024² / 20 steps / dpm++2mv2 / cfg-scale 2.0 / guidance 3.5 / id-weight 1.0 / --diffusion-fa / --vae-tiling. Embedding loaded through gguf_init_from_file (PuLID id-embedding: loaded [2048, 32] type=f16), identity injected across all steps, source identity preserved in a scene unrelated to the reference photo.

Usage

sd ... --diffusion-model flux1-dev.gguf --pulid-weights pulid_flux_v0.9.1.safetensors \
       --pulid-id-embedding face.pulidembd --pulid-id-weight 1.0

The .pulidembd is a gguf with one fp16 tensor pulid_id [2048,32] (produce it with scripts/pulid_extract_id.py).

Adds PuLID-Flux identity injection to the Flux denoise path: a pulid.hpp module, the id-embedding threaded through flux.hpp and stable-diffusion.cpp, CLI flags in examples/common, and scripts/pulid_extract_id.py to produce the embedding. The id-embedding is stored as a gguf container (a single fp16 tensor) and loaded through the same gguf_init_from_file path as the pulid_ca weights, so there's no bespoke binary header.

RapidMark mentioned this pull request Jun 1, 2026

feat: PuLID-Flux identity-injection support #1542

Closed

RapidMark force-pushed the cloudhands/pulid-flux-gguf branch from d70feb4 to b7249f0 Compare June 7, 2026 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PuLID-Flux identity-injection support#1595

feat: PuLID-Flux identity-injection support#1595
RapidMark wants to merge 1 commit into
leejet:masterfrom
CloudhandsAI:cloudhands/pulid-flux-gguf

RapidMark commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RapidMark commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. The feature (original submission, #1542)

What's included

Why split extraction from injection

Verification

Measurements

Not yet supported (in docs/pulid.md)

Backward compatibility

2. Update — gguf id-embedding (this PR, supersedes #1542)

3. Follow-up — rebased onto the src/ layout refactor (#1615)

Validation (rebased branch)

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RapidMark commented Jun 1, 2026 •

edited

Loading