feat(embedders)!: default to text-embedding-3-small #11742
Conversation
|
@camgrimsec is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||
bogdankostic
left a comment
There was a problem hiding this comment.
Thanks for the PR @camgrimsec! I left two comments.
Also, can you point me to the place in the OpenAI docs stating that text-embedding-ada-002 is deprecated?
| The name of the model deployed on Azure. The default is `text-embedding-3-small`, | ||
| which is roughly 5x cheaper than the legacy `text-embedding-ada-002` and scores higher | ||
| on the MTEB benchmark. Note that this is a deployment name in your Azure resource, | ||
| so the deployment must exist there. See OpenAI's | ||
| [deprecation notice](https://platform.openai.com/docs/deprecations) for details. |
There was a problem hiding this comment.
Let's keep this doc string short.
| The name of the model deployed on Azure. The default is `text-embedding-3-small`, | |
| which is roughly 5x cheaper than the legacy `text-embedding-ada-002` and scores higher | |
| on the MTEB benchmark. Note that this is a deployment name in your Azure resource, | |
| so the deployment must exist there. See OpenAI's | |
| [deprecation notice](https://platform.openai.com/docs/deprecations) for details. | |
| The name of the model deployed on Azure. The default is `text-embedding-3-small`. |
There was a problem hiding this comment.
Applied
shortened to the one-line version you suggested
and made the same change in the other three embedders (OpenAITextEmbedder, OpenAIDocumentEmbedder, AzureOpenAITextEmbedder) for consistency.
There was a problem hiding this comment.
We need double backtick for in-line code in release notes.
There was a problem hiding this comment.
Fixed
All inline code in the release note now uses double backticks
and the deprecation-page reference was replaced with a link to OpenAI's new-embedding-models announcement
The original "legacy/deprecated" framing was inaccurate ada-002 is not on the formal deprecation schedule
…da-002) Switches default model for OpenAITextEmbedder, OpenAIDocumentEmbedder, AzureOpenAITextEmbedder, and AzureOpenAIDocumentEmbedder from text-embedding-ada-002 to text-embedding-3-small. text-embedding-3-small is the previous-generation ada-002's successor: ~5x cheaper per token and higher MTEB scores per OpenAI's announcement (https://openai.com/index/new-embedding-models-and-api-updates/). Users can pin the old model explicitly via model= or azure_deployment=.
02c47c1 to
2b1391a
Compare
|
Thanks for pushing on the citation. You're right. In their announcement (https://openai.com/index/new-embedding-models-and-api-updates/) they call ada-002 the "previous generation" model and state "We are not deprecating The motivation for the default switch is the 5x price reduction and higher MTEB score, not a deprecation timeline. I've updated the PR to reflect that:
Apologies |
…d.yaml Co-authored-by: bogdankostic <bogdankostic@web.de>
Switches the default model for OpenAITextEmbedder, OpenAIDocumentEmbedder, AzureOpenAITextEmbedder, and AzureOpenAIDocumentEmbedder from text-embedding-ada-002 to text-embedding-3-small.
text-embedding-3-small is roughly 5x cheaper per token ($0.02 vs $0.10 per 1M tokens) and scores higher on MTEB. OpenAI marks ada-002 as legacy: https://platform.openai.com/docs/deprecations
Backward-compatible: users passing model= or azure_deployment= explicitly are unaffected. Release note added covering the embedding-incompatibility caveat and the Azure-deployment caveat.
Only default-assertion tests were updated; tests that explicitly construct embedders with ada-002 are preserved (it is still a valid model name).
Related Issues
Proposed Changes:
How did you test it?
Notes for the reviewer
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.