Qdrant#

You can use Qdrant as a document store for DocumentArray. It’s suitable for faster Document retrieval on embeddings, i.e. .match(), .find().

Tip

This feature requires qdrant-client. You can install it with pip install "docarray[qdrant]".

Usage#

Start Qdrant service#

To use Qdrant as the storage backend, you need a running Qdrant server. You can create docker-compose.yml to use the Qdrant Docker image:

---
version: '3.4'
services:
  qdrant:
    image: qdrant/qdrant:v0.10.1
    ports:
      - "6333:6333"
      - "6334:6334"
    ulimits: # Only required for tests, as there are a lot of collections created
      nofile:
        soft: 65535
        hard: 65535
...

Then

docker-compose up

Create DocumentArray with Qdrant backend#

Assuming you start the service with the default configuration (i.e. server address is http://localhost:6333), you can instantiate a DocumentArray with Qdrant storage like so:

from docarray import DocumentArray

da = DocumentArray(storage='qdrant', config={'n_dim': 10})

The usage is the same as an ordinary DocumentArray.

To access a formerly-persisted DocumentArray, you can specify the collection_name, host and port:

from docarray import DocumentArray

da = DocumentArray(
    storage='qdrant',
    config={
        'collection_name': 'persisted',
        'host': 'localhost',
        'port': '6333',
        'n_dim': 10,
    },
)

da.summary()

Note that you must specify n_dim before using Qdrant as a backend for DocumentArray.

Other functions behave the same as an in-memory DocumentArray.

Configuration#

Name	Description	Default
`n_dim`	Number of dimensions of embeddings to be stored and retrieved	This is always required
`collection_name`	Qdrant collection name client	Random collection name generated
`distance`	Distance metric to use during search. Can be ‘cosine’ (similarity), ‘dot’ or ‘euclidean’	`'cosine'`
`host`	Hostname of the Qdrant server	`'localhost'`
`port`	Port of the Qdrant server	`6333`
`grpc_port`	Port of the Qdrant gRPC interface	`6334`
`prefer_grpc`	Set `True` to use gPRC interface whenever possible in custom methods	`False`
`api_key`	API key for authentication in Qdrant Cloud	`None`
`https`	Set `True` to use HTTPS(SSL) protocol	`None`
`serialize_config`	Serialization config of each Document	`None`
`scroll_batch_size`	Batch size used when scrolling over the storage	`64`
`ef_construct`	Number of neighbours to consider during the index building. Larger = more accurate search, more time to build index	`None`, defaults to the default value in Qdrant*
`full_scan_threshold`	Minimal size (in KiloBytes) of vectors for additional payload-based indexing	`None`, defaults to the default value in Qdrant*
`m`	Number of edges per node in the index graph. Larger = more accurate search, more space required	`None`, defaults to the default value in Qdrant*
`columns`	Other fields to store in Document	`None`
`list_like`	Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance.	True
`root_id`	Boolean flag indicating whether to store `root_id` in the tags of chunk level Documents	True

*You can read more about the HNSW parameters and their default values here

Minimum example#

Create docker-compose.yml:

---
version: '3.4'
services:
  qdrant:
    image: qdrant/qdrant:v0.10.1
    ports:
      - "6333:6333"
      - "6334:6334"
    ulimits: # Only required for tests, as there are a lot of collections created
      nofile:
        soft: 65535
        hard: 65535
...

pip install -U docarray[qdrant]
docker-compose up

import numpy as np

from docarray import DocumentArray

N, D = 100, 128

da = DocumentArray.empty(
    N, storage='qdrant', config={'n_dim': D, 'distance': 'cosine'}
)  # init

da.embeddings = np.random.random([N, D])

print(da.find(np.random.random(D), limit=10))

<DocumentArray (length=10) at 4917906896>

Vector search with filter#

Search with .find can be restricted by user-defined filters. The supported tag types for filter are 'int', 'float', 'bool', 'str', 'text' and 'geo' as in Qdrant. Such filters can be constructed following the guidelines in Qdrant’s Documentation

Example of `.find` with filter#

Let’s create Documents with embeddings [0,0,0] up to [9,9,9], where each Document (which has an embedding [i,i,i]) has a tag price with value i:

from docarray import Document, DocumentArray
import numpy as np

n_dim = 3
distance = 'euclidean'

da = DocumentArray(
    storage='qdrant',
    config={'n_dim': n_dim, 'columns': {'price': 'float'}, 'distance': distance},
)

print(f'\nDocumentArray distance: {distance}')

with da:
    da.extend(
        [
            Document(id=f'r{i}', embedding=i * np.ones(n_dim), tags={'price': i})
            for i in range(10)
        ]
    )

print('\nIndexed Prices:\n')
for embedding, price in zip(da.embeddings, da[:, 'tags__price']):
    print(f'\tembedding={embedding},\t price={price}')

We want the nearest vectors to the embedding [8. 8. 8.], with the restriction that prices must follow a filter. For example, retrieved Documents must have price value lower than or equal to max_price. You can encode this information in Qdrant using filter = {'must': [{'key': 'price', 'range': {'lte': max_price}}]}. You can also pass additional search_params following Qdrant’s Search API.

You can then implement and search with the proposed filter:

max_price = 7
n_limit = 4

np_query = np.ones(n_dim) * 8
print(f'\nQuery vector: \t{np_query}')

filter = {'must': [{'key': 'price', 'range': {'lte': max_price}}]}
results = da.find(np_query, filter=filter, limit=n_limit, search_params={"hnsw_ef": 64})

print('\nEmbeddings Nearest Neighbours with "price" at most 7:\n')
for embedding, price in zip(results.embeddings, results[:, 'tags__price']):
    print(f'\tembedding={embedding},\t price={price}')

This prints:

Query vector: 	[8. 8. 8.]

Embeddings Nearest Neighbours with "price" at most 7:

	embedding=[7. 7. 7.],	 price=7
	embedding=[6. 6. 6.],	 price=6
	embedding=[5. 5. 5.],	 price=5
	embedding=[4. 4. 4.],	 price=4

Note

For Qdrant, the distance scores can be accessed in the Document’s .scores dictionary by the key f'{distance_metric}_similarity'. For example, for distance = 'euclidean' the key would be 'euclidean_similarity'.

Example of `.filter` with a filter#

The following example shows how to use DocArray with Qdrant document store to filter text documents. Let’s create Documents with the tag price with a value of i:

from docarray import Document, DocumentArray
import numpy as np

n_dim = 3

da = DocumentArray(
    storage='qdrant',
    config={'n_dim': n_dim, 'columns': {'price': 'float'}},
)

with da:
    da.extend(
        [
            Document(id=f'r{i}', embedding=i * np.ones(n_dim), tags={'price': i})
            for i in range(10)
        ]
    )

print('\nIndexed Prices:\n')
for embedding, price in zip(da.embeddings, da[:, 'tags__price']):
    print(f'\tembedding={embedding},\t price={price}')

If you want to filter only for results with a price less than or equal to max_price, you can encode this information using filter = {'price': {'$lte': max_price}}.

You can then implement and search with the proposed filter:

max_price = 7
n_limit = 4

filter = {'must': [{'key': 'price', 'range': {'lte': max_price}}]}
results = da.filter(filter=filter, limit=n_limit)

print('\nPoints with "price" at most 7:\n')
for embedding, price in zip(results.embeddings, results[:, 'tags__price']):
    print(f'\tembedding={embedding},\t price={price}')

This prints:

Points with "price" at most 7:

	embedding=[6. 6. 6.],	 price=6
	embedding=[7. 7. 7.],	 price=7
	embedding=[1. 1. 1.],	 price=1
	embedding=[2. 2. 2.],	 price=2

Qdrant#

Usage#

Start Qdrant service#

Create DocumentArray with Qdrant backend#

Configuration#

Minimum example#

Vector search with filter#

Example of .find with filter#

Example of .filter with a filter#

Example of `.find` with filter#

Example of `.filter` with a filter#