Smart. When they do come, will the embedding vectors be OpenAI compatible? I ass...

minimaxir · on Feb 8, 2024

Embeddings as an I/O schema are just text-in, a list of numbers out. There are very few embedding models which require enough preprocessing to warrant an abstraction. (A soft example is the new nomic-embed-text-v1, which requires adding prefix annotations: https://huggingface.co/nomic-ai/nomic-embed-text-v1 )

osigurdson · on Feb 8, 2024

Yes of course (syntactically it is just float[] getEmbeddings(text)) but are the numbers close to what OpenAI would produce? I assume no.

minimaxir · on Feb 8, 2024

This submission only about I/O schema: the embeddings themselves are dependent on the model, and since OpenAI's models are closed source no one can reproduce them.

No direct embedding model can be cross-compatable. (exception: constrastive learning models like CLIP)

dragonwriter · on Feb 8, 2024

Probably not, embedding vectors aren't conpatible across different embedding models, and other tools presenting OAI-compatible APIs don't use OAI-compatible embedding models (e.g., oobabooga lets you configure different embeddings models, but none of them produce compatible vectors to the OAI ones.)