Embeddings (OpenAI) (Generative Models)
Synopsis
Calculates embeddings from a text column and stores them as new column.Description
This operator calculates an embedding (a vector in a high-dimensional space) from a text column. The resulting vectors are stored in a new column. These embeddings can be used as input to machine learning algorithms but also as input to vector stores for performing similarity-based retrieval. The most common OpenAI models used for embeddings, text-embedding-3-small or text-embedding-ada-002, produce output vectors with a dimension of 1536. Please refer to the documentation to learn more about the vector size for other models: https://platform.openai.com/docs/guides/embeddings The models of version 3 can also utilize the dimensions parameter to get the desired number of dimensions of the embeddings. Please also note that all embedding operators write the number of dimensions as log entry, too.Input
- data (Data table)
The data containing the text column for which the embedding should be added.
- connection (Connection)
A connection object to OpenAI.
Output
- data (Data table)
The resulting data set with the new embedding column.
- connection (Connection)
The input connection object to OpenAI.
Parameters
- model Identifies the model which should be used for calculating the embedding. Range:
- input The text column for which the embeddings should be calculated. Range:
- name The name of the column for storing the calculated embeddings. Range:
- dimensions The desired number of dimensions (optional). Only supported for the version 3 models. Please check the OpenAI documentation for details. Range:
- conda_environment The conda environment used for this downloading task. Additional packages may be installed into this environment, please refer to the extension documentation for additional details on this and on version requirements for Python and some packages which have be present in this environment. Range:
Tutorial Processes
Calculate embeddings with OpenAI
This process takes some texts as input and adds a new column with embeddings (a vector in a high-dimensional space) as a new column.