Skip to content

feat: PuLID-Flux identity-injection support#1595

Open
RapidMark wants to merge 1 commit into
leejet:masterfrom
CloudhandsAI:cloudhands/pulid-flux-gguf
Open

feat: PuLID-Flux identity-injection support#1595
RapidMark wants to merge 1 commit into
leejet:masterfrom
CloudhandsAI:cloudhands/pulid-flux-gguf

Conversation

@RapidMark
Copy link
Copy Markdown
Contributor

@RapidMark RapidMark commented Jun 1, 2026

Adds PuLID-Flux identity injection to the Flux denoise loop (works on CUDA / Vulkan / HIP / Metal). Given a single source portrait, generated images preserve the source person's face across arbitrary scenes and prompts. Pure-ggml implementation (every op has a backend kernel), so it's cross-vendor by construction.

This PR has three stages, all folded in below: (1) the original feature (#1542), (2) the gguf id-embedding rework (this PR, supersedes #1542), and (3) a rebase onto current master after the #1615 src-layout refactor.

1. The feature (original submission, #1542)

Mirrors the reference ToTheBeginning/PuLID (encoders_transformer.py + flux/model.py) and the PuLID v0.9.1 hook schedule (every 2nd of the double blocks, every 4th of the single blocks).

What's included

  • src/model/adapter/pulid.hppPuLIDPerceiverAttentionCA, the cross-attention module (Q from image tokens, K/V from the ID embedding). Pure-ggml graph; runs on CPU / CUDA / Vulkan / Metal without backend-specific code.
  • src/model/diffusion/flux.hpp — adds the pulid_ca.<i> child blocks to Flux (constructed conditionally when PuLID weights are present), inserts the cross-attention between transformer blocks at the reference intervals (every 2nd double, every 4th single), and threads the identity embedding + weight through forward / forward_orig / compute / build_graph. skip_layers + PuLID is explicitly refused (would misalign the hook schedule).
  • src/stable-diffusion.cpp — loads the pulid_ca.* weights via model_loader under the existing model.diffusion_model. prefix so they bind to the new blocks naturally, and loads the id-embedding, wrapping it as a sd::Tensor<float> passed via DiffusionParams.
  • include/stable-diffusion.h — public API: sd_pulid_params_t (per-generation embedding path + weight), pulid_weights_path on sd_ctx_params_t, pulid_params on sd_img_gen_params_t.
  • examples/common/common.{cpp,h} — three CLI flags: --pulid-weights, --pulid-id-embedding, --pulid-id-weight.
  • src/model/diffusion/model.hpp — extends DiffusionParams to carry the embedding + weight; FluxModel::compute forwards both.
  • docs/pulid.md — usage, embedding format, supported PuLID versions (v0.9.0 / v0.9.1; v1.1 deferred), memory-budget notes, and the three-way SHA-256 falsification recipe.
  • scripts/pulid_extract_id.py — reference precompute tool that produces the id-embedding from a source portrait.

Why split extraction from injection

PuLID-Flux's identity extractor is a stack of three large PyTorch models (ArcFace + EVA-CLIP-L + IDFormer). Porting all three to C++/ggml would add thousands of lines for code that runs once per source person. By making sd.cpp consume a precomputed embedding, the C++ surface stays small (~600 lines), the heavy ML stack runs once on any PyTorch backend, and PuLID is decoupled from active development on insightface / EVA-CLIP / IDFormer.

Verification

The three-way SHA-256 falsification recipe in docs/pulid.md distinguishes "wired but inert" from "actively altering the trajectory":

Run Expected hash relation
A: no --pulid-* flags baseline
B: PuLID flags, --pulid-id-weight 0.0 byte-identical to A
C: PuLID flags, --pulid-id-weight 1.0 differs from A, preserves source identity

Verified on three backends from the same source:

  • Vulkan-AMD (RX 6700 XT, -DSD_VULKAN=ON): A == B byte-identical, A != C, C preserves identity.
  • Vulkan-NVIDIA (RTX 3060, same binary, --backend "diffusion=vulkan1"): A == B, A != C, C visually equivalent to the AMD output at the same seed (different bytes per the usual cross-backend nondeterminism).
  • CUDA-NVIDIA (RTX 3060, separate -DSD_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86 build, CUDA 13.2): A == B byte-identical, A != C, C preserves identity. The PerceiverAttentionCA pure-ggml graph runs unchanged across all three — no backend-specific conditionals.

Measurements

Per-image sampling at 512×512 / 4 steps / Flux Schnell Q4 + PuLID:

Backend Sampling (s) Notes
AMD 6700 XT (Vulkan) 22 12 GB consumer card
NVIDIA 3060 (Vulkan) 11 same binary as AMD
NVIDIA 3060 (CUDA) 9.6 separate -DSD_CUDA=ON build

batch_count=3 confirms long-lived-process amortization: per-image sampling drops from 19.6 s (cold) to ~11 s (warm) as the model stays resident across iterations. Tested with Flux Schnell Q4_K_S at 512²/4 and Flux Dev Q4_K_S at 768²/20. At 1024² the VAE decode needs a large single compute buffer that can exceed a consumer card's max-allocation limit → use --vae-tiling (or route the VAE to a roomier backend); this is existing sd.cpp behavior, not PuLID-specific, but documented in docs/pulid.md since PuLID users hit it.

Not yet supported (in docs/pulid.md)

  • PuLID v1.1 (renamed key layout id_adapter_attn_layers.*). Follow-up.
  • Multiple ID images fused into one embedding (reference pipeline supports it; the precompute tool takes one portrait per run).
  • The --true-cfg negative branch — PuLID only injects on the positive conditioning path in the reference; this matches.

Backward compatibility

Non-PuLID generations are unaffected: PuLID is only constructed when the loader sees a pulid_ca.* tensor. A regression run without --pulid-* flags is byte-identical to pre-patch.

2. Update — gguf id-embedding (this PR, supersedes #1542)

In the original submission the id-embedding was a bespoke PULIDV01 32-byte-header binary with a hand-rolled parser. Per @Green-Sky's review on #1542, this PR reworks it into a standard GGUF container: a single fp16 tensor pulid_id of shape [2048, 32], loaded via gguf_init_from_file exactly like the pulid_ca.* weights. The custom header and parser are gone — one fewer on-disk format, and the embedding loads through the same path as every other tensor. scripts/pulid_extract_id.py writes the gguf; docs/pulid.md documents the gguf layout.

Opened fresh rather than force-pushing #1542 because that branch had drifted onto an old master and its history was tangled; a clean rebase was easier to review.

3. Follow-up — rebased onto the src/ layout refactor (#1615)

Master landed the #1615 "reorganize src model layout" refactor while this PR was open. This rebase re-homes PuLID onto the new tree, no functional change:

  • src/pulid.hppsrc/model/adapter/pulid.hpp (includes updated to core/ggml_extend.hpp + model/common/block.hpp).
  • The Flux injection moved with src/flux.hppsrc/model/diffusion/flux.hpp.
  • Detection follows the refactor's factory: FluxParams renamed FluxConfig, weight-based config now in FluxConfig::detect_from_weights; PuLID auto-detection (pulid_ca.pulid_enabled) moved into it.
  • DiffusionParams plumbing follows src/diffusion_model.hppsrc/model/diffusion/model.hpp; includes updated to the new src/{core,model}/... paths.

The feature, hook schedule, and CLI/API surface are unchanged from stages 1–2.

Validation (rebased branch)

  • Build: OK — clean Vulkan build (sd-cli + sd-server link).
  • Identity: OK on RDNA4 (Radeon RX 9070 XT, Vulkan): Flux Krea-Dev Q4 + PuLID v0.9.1 + a freshly extracted gguf id-embedding, 1024² / 20 steps / dpm++2mv2 / cfg-scale 2.0 / guidance 3.5 / id-weight 1.0 / --diffusion-fa / --vae-tiling. Embedding loaded through gguf_init_from_file (PuLID id-embedding: loaded [2048, 32] type=f16), identity injected across all steps, source identity preserved in a scene unrelated to the reference photo.

Usage

sd ... --diffusion-model flux1-dev.gguf --pulid-weights pulid_flux_v0.9.1.safetensors \
       --pulid-id-embedding face.pulidembd --pulid-id-weight 1.0

The .pulidembd is a gguf with one fp16 tensor pulid_id [2048,32] (produce it with scripts/pulid_extract_id.py).

Adds PuLID-Flux identity injection to the Flux denoise path: a pulid.hpp
module, the id-embedding threaded through flux.hpp and stable-diffusion.cpp,
CLI flags in examples/common, and scripts/pulid_extract_id.py to produce
the embedding. The id-embedding is stored as a gguf container (a single
fp16 tensor) and loaded through the same gguf_init_from_file path as the
pulid_ca weights, so there's no bespoke binary header.
@RapidMark RapidMark force-pushed the cloudhands/pulid-flux-gguf branch from d70feb4 to b7249f0 Compare June 7, 2026 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant