[feat] 说话人分离:可选返回每个说话人的声纹质心向量 (spk_embedding_center) by phoenixray2000 · Pull Request #2967 · modelscope/FunASR

phoenixray2000 · 2026-06-06T18:39:29Z

背景 / 动机

AutoModel.generate 做说话人分离时,postprocess() 内部其实已经算出了每个说话人的质心向量(同一聚类簇内各 chunk 嵌入的均值),但用完即丢、没有返回。下游若想做说话人声纹 / 身份识别,只能再额外跑一遍声纹模型重新抽取,白白浪费一次算力。

改动

新增可选参数 return_spk_center(默认 False):

funasr/models/campplus/utils.py::postprocess:当 return_spk_center=True 时,额外返回 spk_embs(每个说话人的质心,按 correct_labels 修正后的说话人 id 对齐)。
funasr/auto/auto_model.py:当 generate(..., return_spk_center=True) 时,在结果中加入 spk_embedding_center,形状 [说话人数, 嵌入维度],其下标与 sentence_info 中的 spk 一一对应。逐 chunk 的 spk_embedding 仍按原逻辑删除,输出不膨胀。

兼容性

完全向后兼容:默认关闭;不传 return_spk_center=True 时 postprocess() 返回结构保持不变,现有调用方(auto_model、auto_frontend)均不受影响。

验证

本地用 paraformer-zh + ERes2NetV2(spk_mode=punc_segment)对一段双人音频测试:spk_embedding_center 形状为 (2, 192),与 sentence_info 中的 2 个说话人对齐;两人质心 L2 归一化后余弦相似度约 0.34(可区分)。

用法

res = model.generate(input=wav, return_spk_center=True)
centers = res[0]["spk_embedding_center"]  # np.ndarray, 形状 [说话人数, 维度]

🤖 Generated with Claude Code

Add a return_spk_center option so AutoModel.generate surfaces the per-speaker centroid embeddings (mean of clustered chunk embeddings) that diarization already computes in postprocess() but currently discards. Lets downstream speaker voiceprint / identity reuse them without re-embedding. Backward compatible: default off; postprocess return shape is unchanged unless return_spk_center=True. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a new option return_spk_center to retrieve per-speaker ERes2NetV2 centroids (speaker embedding centers) during speaker diarization. When enabled, the postprocess function returns both the results and the speaker centroids, which are then saved in the output dictionary. Feedback on these changes includes: 1) converting the PyTorch tensor spk_embedding.cpu() to a NumPy array using .numpy() before passing it to postprocess to match its type hint; 2) updating the return type hint of postprocess to Union[list, tuple] to reflect the conditional return type; and 3) optimizing performance by lazily computing spk_embs only when return_spk_center is enabled.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

- pass np.ndarray (not torch.Tensor) to postprocess to match its type hint - update postprocess return hint to Union[list, tuple] - compute spk_embs lazily, only when return_spk_center=True Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

phoenixray2000 mentioned this pull request Jun 6, 2026

feat(spk): optionally return per-speaker embedding centroids phoenixray2000/FunASR#1

Closed

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread funasr/auto/auto_model.py Outdated

Comment thread funasr/models/campplus/utils.py Outdated

Comment thread funasr/models/campplus/utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] 说话人分离:可选返回每个说话人的声纹质心向量 (spk_embedding_center)#2967

[feat] 说话人分离:可选返回每个说话人的声纹质心向量 (spk_embedding_center)#2967
phoenixray2000 wants to merge 2 commits into
modelscope:mainfrom
phoenixray2000:feat/spk-embedding-center

phoenixray2000 commented Jun 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phoenixray2000 commented Jun 6, 2026

背景 / 动机

改动

兼容性

验证

用法

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant