Categories

Versions

Retrieve Documents (Milvus) (Generative Models)

Synopsis

Retrieves documents from a collection of the vector database Milvus and adds them as columns

Description

Retrieves the most similar documents from a collection of the vector database Milvus and adds their payloads and similarity scores as new columns to the input data. You need to specify a collection name as well as a column which contains the embeddings for each document (as comma-separated list of values). The size of these embedding vectors must match the vector size of the collection to which you want to add the documents and has been specified during creation of the collection.

Input

  • data (Data table)

    A data set with at least one embedding column, one embedding per row for each query document. The results will be added as columns to this data set.

  • connection (Connection)

    A Dictionary Connection to a Milvus vector database.

Output

  • data (Data table)

    The resulting data set.

  • connection (Connection)

    The input connection which is passed through here as output.

Parameters

  • collection The name of the collection to retrieve the documents from. The vector size of the collection needs to be the same as the size of the provided embeddings.
  • embeddings column The column in your data containing the embeddings for each query document. The values of the embeddings column need to be a comma-separated list of embedding values. The number of values needs to be the same as the vector size of the collection from which the documents should be retrieved.
  • number of results The number of retrieved results. For each result there will be new columns added to the input data, one for the score and one for each key of the payload which has been originally added to the vector database.
  • conda environment The conda environment used for this task. Please refer to the extension documentation for additional details on this and on version requirements for Python and all used packages in this environment.

Tutorial Processes

Retrieve similar documents from Milvus

This tutorial workflow shows how one can retrieve the most similar documents from a Milvus collection. The input data must contain an embeddings column which will be used for the similarity search. Please note that you will need to have a Milvus database running for this tutorial to work. The database connection must be delivered as input for this tutorial to work. The connection must be a Dictionary Connection with the keys 'uri' and 'token'.