Instructor
The instructor-embeddings library is another option, especially when running on a machine with a cuda-capable GPU. They are a good local alternative to OpenAI (see the Massive Text Embedding Benchmark rankings). The embedding function requires the InstructorEmbedding package. To install it, run pip install InstructorEmbedding
.
There are three models available. The default is hkunlp/instructor-base
, and for better performance you can use hkunlp/instructor-large
or hkunlp/instructor-xl
. You can also specify whether to use cpu
(default) or cuda
. For example:
#uses base model and cpu
import chromadb.utils.embedding_functions as embedding_functions
ef = embedding_functions.InstructorEmbeddingFunction()
or
import chromadb.utils.embedding_functions as embedding_functions
ef = embedding_functions.InstructorEmbeddingFunction(
model_name="hkunlp/instructor-xl", device="cuda")
Keep in mind that the large and xl models are 1.5GB and 5GB respectively, and are best suited to running on a GPU.