Skip to content

feat(embedders)!: default to text-embedding-3-small #11742

Open
camgrimsec wants to merge 4 commits into
deepset-ai:mainfrom
camgrimsec:feat/embedders-default-3-small
Open

feat(embedders)!: default to text-embedding-3-small #11742
camgrimsec wants to merge 4 commits into
deepset-ai:mainfrom
camgrimsec:feat/embedders-default-3-small

Conversation

@camgrimsec

Copy link
Copy Markdown
Contributor

Switches the default model for OpenAITextEmbedder, OpenAIDocumentEmbedder, AzureOpenAITextEmbedder, and AzureOpenAIDocumentEmbedder from text-embedding-ada-002 to text-embedding-3-small.

text-embedding-3-small is roughly 5x cheaper per token ($0.02 vs $0.10 per 1M tokens) and scores higher on MTEB. OpenAI marks ada-002 as legacy: https://platform.openai.com/docs/deprecations

Backward-compatible: users passing model= or azure_deployment= explicitly are unaffected. Release note added covering the embedding-incompatibility caveat and the Azure-deployment caveat.

Only default-assertion tests were updated; tests that explicitly construct embedders with ada-002 are preserved (it is still a valid model name).

Related Issues

  • fixes #issue-number

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@camgrimsec camgrimsec requested a review from a team as a code owner June 23, 2026 17:59
@camgrimsec camgrimsec requested review from bogdankostic and removed request for a team June 23, 2026 17:59
@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

@camgrimsec is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 23, 2026
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/embedders
  openai_document_embedder.py
  openai_text_embedder.py
Project Total  

This report was generated by python-coverage-comment-action

@bogdankostic bogdankostic changed the title feat(embedders): default to text-embedding-3-small feat(embedders)!: default to text-embedding-3-small Jun 24, 2026

@bogdankostic bogdankostic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @camgrimsec! I left two comments.

Also, can you point me to the place in the OpenAI docs stating that text-embedding-ada-002 is deprecated?

Comment on lines +70 to +74
The name of the model deployed on Azure. The default is `text-embedding-3-small`,
which is roughly 5x cheaper than the legacy `text-embedding-ada-002` and scores higher
on the MTEB benchmark. Note that this is a deployment name in your Azure resource,
so the deployment must exist there. See OpenAI's
[deprecation notice](https://platform.openai.com/docs/deprecations) for details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this doc string short.

Suggested change
The name of the model deployed on Azure. The default is `text-embedding-3-small`,
which is roughly 5x cheaper than the legacy `text-embedding-ada-002` and scores higher
on the MTEB benchmark. Note that this is a deployment name in your Azure resource,
so the deployment must exist there. See OpenAI's
[deprecation notice](https://platform.openai.com/docs/deprecations) for details.
The name of the model deployed on Azure. The default is `text-embedding-3-small`.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

shortened to the one-line version you suggested

and made the same change in the other three embedders (OpenAITextEmbedder, OpenAIDocumentEmbedder, AzureOpenAITextEmbedder) for consistency.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need double backtick for in-line code in release notes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

All inline code in the release note now uses double backticks

and the deprecation-page reference was replaced with a link to OpenAI's new-embedding-models announcement

The original "legacy/deprecated" framing was inaccurate ada-002 is not on the formal deprecation schedule

…da-002)

Switches default model for OpenAITextEmbedder, OpenAIDocumentEmbedder,
AzureOpenAITextEmbedder, and AzureOpenAIDocumentEmbedder from
text-embedding-ada-002 to text-embedding-3-small.

text-embedding-3-small is the previous-generation ada-002's successor:
~5x cheaper per token and higher MTEB scores per OpenAI's announcement
(https://openai.com/index/new-embedding-models-and-api-updates/).

Users can pin the old model explicitly via model= or azure_deployment=.
@camgrimsec camgrimsec force-pushed the feat/embedders-default-3-small branch from 02c47c1 to 2b1391a Compare June 24, 2026 10:12
@camgrimsec

camgrimsec commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for pushing on the citation.

You're right.text-embedding-ada-002 is not on the deprecation schedule in OpenAI's docs. The accurate framing is that OpenAI introduced text-embedding-3-small as the recommended successor.

In their announcement (https://openai.com/index/new-embedding-models-and-api-updates/) they call ada-002 the "previous generation" model and state "We are not deprecating text-embedding-ada-002".

The motivation for the default switch is the 5x price reduction and higher MTEB score, not a deprecation timeline.

I've updated the PR to reflect that:

  • Release note now links to the announcement instead of the deprecation page, and drops the "legacy" claim.
  • Commit message reworded.
  • Docstrings shortened per your inline suggestions (applied to all four embedders for consistency).
  • Release note now uses double backticks for inline code.

Apologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants