Support for Milvus as a Vector Database in LangStream

We are happy to announce that as of release 0.0.22 LangStream now supports Milvus as a vector database. This integration broadens LangStream’s support for vector databases giving users more flexibility in their Gen AI applications.

Milvus is a popular open-source vector database. It is built for the cloud-native environment and is highly scalable. It supports vector similarity search and provides a wide range of similarity search algorithms. A cloud version of Milvus, called Zilliz, is also available.

Here’s a detailed walkthrough of how you can leverage Milvus/Zilliz with LangStream.

Understanding Vector Databases in LangStream

Vector databases form an important component of many Gen AI applications in LangStream. They store vector representations (embeddings) of various data, including text. By including search tools, vector databases enable similarity search on the vector representations, enabling the users to find semantically similar data in the database.

Vector databases are typically used in LangStream as part of the Retrieval Augmented Generation (RAG) applications. A LangStream application retrieves relevant documents or passages from a vector database based on their semantic relevance, providing context to the LLM for generating responses. This makes the responses generated by the LLM more relevant to the user’s query and reduces hallucinations.

LangStream has native support for several vector databases, including Pinecone, Apache Cassandra, and Datastax Astra DB. Now, with support for Milvus, users have a broader range of options for vector storage and similarity search.

Configuring Milvus as a Vector Database in LangStream

Suppose we want to use Milvus as a vector database in a LangStream application. Here is a simplified step-by-step process to configure it for use in LangStream.

Create or update a configuration.yaml file to specify a resource of type vector-database with a service name of milvus.

For open-source Milvus, set the host and port. For the Zilliz cloud service, set the URL and token.

You can write records to the vector database as either an upsert or delete-insert action. Upsert is preferred, but as of this writing was not supported by the Zilliz cloud service.

    
 resources:
   - name: "MilvusDatasource"
     configuration:
       service: "milvus"
       # Milvus
       username: "{{{ secrets.milvus.username }}}"
       password: "{{{ secrets.milvus.password }}}"
       host: "<host name>"
       port: "<post>"
       write-mode: "upsert"
       # Zillis
       url: "https://milvus-cloud-xxxxx.milvuscloud.com"
       token: "{{{ secrets.milvus.token }}}"
       write-mode: "delete-insert"
    

Update the secrets.yaml file to include the credentials for service.

 - name: milvus
   id: milvus
   data:
     # Milvus
     username: "<username>"
     password: "<password>"
     # Zilliz
     token: "<token>"

Optionally, configure LangStream to automatically create Collections and Indexes in Milvus using an asset configuration inside the application file.

 assets:
   - name: "documents-table"
     asset-type: "milvus-collection"
     creation-mode: create-if-not-exists
     deletion-mode: delete
     config:
       collection-name: "docs"
       database-name: "default"
       datasource: "MilvusDatasource"
       create-statements:
         - |
           {
               "command": "create-collection",
               "collection-name": "docs",
               "database-name": "default",
               "field-types": [
                 {
                     "name": "filename_and_chunkid",
                     "primary-key": true,
                     "data-type": "Varchar",
                     "max-length": 1024
                 },                
                 {
                     "name": "text",
                     "data-type": "Varchar",
                     "max-length": 65535
                 },
                 {
                     "name": "language",
                     "data-type": "Varchar",
                     "max-length": 3
                 },
                 {
                     "name": "vector",
                     "data-type": "FloatVector",
                     "dimension": 1536
                 }
               ]
           }
         - |
           {
             "command": "create-index",
             "collection-name": "docs",
             "database-name": "default",
             "field-name": "vector",
             "index-name": "vector_index",
             "index-type": "AUTOINDEX",
             "metric-type": "L2"
           }
         - |
           {
             "command": "load-collection"
           }

Example: Using Milvus for Writing and Querying Vector Data

With the vector-database configured for Milvus, you’re ready to start writing vector embeddings using vector-db-sink agent and performing semantic similarity queries across the vectors in the database with the help of query-vector-db agent.

Here is how you can setup a pipeline for writing vectors, assuming the configuration above:

  - name: "Write to Milvus"
    type: "vector-db-sink"
    input: chunks-topic
    configuration:
      datasource: "MilvusDatasource"
      collection-name: "docs"
      fields:
        - name: "filename_and_chunkid"
          expression: "fn:concat(value.filename, value.chunk_id)"
        - name: "vector"
          expression: "fn:toListOfFloat(value.embeddings_vector)"
        - name: "language"
          expression: "value.language"
        - name: "text"
          expression: "value.text"
        - name: "num_tokens"
          expression: "value.chunk_num_tokens"

And here is how to set up a pipeline for vector querying:

  - name: "lookup-related-documents-in-llm"
    type: "query-vector-db"
    configuration:
      datasource: "MilvusDatasource"
      query: |
        {
          "collection-name": "docs",
          "vectors": ?,
          "top-k": 10,
          "output-fields": ["text"]
        }
      fields:
        - "value.question_embeddings"
      output-field: "value.related_documents"

And that’s how to use Milvus as a vector database in LangStream. Stay tuned for updates about more such integrations that provide more flexibility for building and running Gen AI applications.

Please send us feedback on this new integration or LangStream in general in Slack or Linen. If you find a bug, please open a GitHub issue.

Chris Bartholomew