Redis#

You can use Redis as a document store for DocumentArray. It’s suitable for faster Document retrieval on embeddings, i.e. .match(), .find().

Tip

This feature requires redis. You can install it via pip install "docarray[redis]".

Usage#

Start Redis service#

To use Redis as the storage backend, it is required to have the Redis service started. Create docker-compose.yml as follows:

version: "3.3"
services:
  redis:
    image: redislabs/redisearch:2.6.0
    ports:
      - "6379:6379"

Then

pip install -U docarray[redis]
docker-compose up

Create DocumentArray with Redis backend#

Assuming the service is started using the default configuration (i.e. server address is localhost:6379), you can instantiate a DocumentArray with Redis storage as such:

from docarray import DocumentArray

da = DocumentArray(
    storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 128}
)

The usage will be the same as the ordinary DocumentArray, but the dimension of an embedding for a Document must be provided at creation time.

To access a previously stored DocumentArray, you can specify index_name and set host and port to match with the previuosly stored DocumentArray.

The following example builds a DocumentArray from previously stored data on localhost:6379:

from docarray import DocumentArray, Document

with DocumentArray(
    storage='redis',
    config={
        'n_dim': 128,
        'index_name': 'idx',
    },
) as da:
    da.extend([Document() for _ in range(1000)])

da2 = DocumentArray(
    storage='redis',
    config={
        'n_dim': 128,
        'index_name': 'idx',
    },
)

da2.summary()

Other functions behave the same as in-memory DocumentArray.

Configuration#

The following configs can be set:

Name	Description	Default
`host`	Host address of the Redis server	`'localhost'`
`port`	Port of the Redis Server	`6379`
`redis_config`	Other Redis configs in a Dict and pass to `Redis` client constructor, e.g. `socket_timeout`, `ssl`	`{}`
`index_name`	Redis index name; the name of RedisSearch index to set this DocumentArray	`None`
`n_dim`	Dimensionality of the embeddings	`None`
`update_schema`	Boolean flag indicating whether to update Redis Search schema	`True`
`distance`	Similarity distance metric in Redis, one of {`'L2'`, `'IP'`, `'COSINE'`}	`'COSINE'`
`batch_size`	Batch size used to handle storage updates	`64`
`method`	Vector similarity index algorithm in Redis, either `FLAT` or `HNSW`	`'HNSW'`
`index_text`	Boolean flag indicating whether to index `.text`. `True` will enable full text search on `.text`	`None`
`tag_indices`	List of tags to index as text field	`[]`
`language`	Optional parameter for Redis text search. Refer to the list of supported languages	`None`
`ef_construction`	Optional parameter for Redis HNSW algorithm	`200`
`m`	Optional parameter for Redis HNSW algorithm	`16`
`ef_runtime`	Optional parameter for Redis HNSW algorithm	`10`
`block_size`	Optional parameter for Redis FLAT algorithm	`1048576`
`initial_cap`	Optional parameter for Redis HNSW and FLAT algorithm	`None`, defaults to the default value in Redis
`columns`	Other fields to store in Document and build schema	`None`
`list_like`	Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance.	`True`
`root_id`	Boolean flag indicating whether to store `root_id` in the tags of chunk level Documents	`True`

You can check the default values in the docarray source code. For vector search configurations, default values are those of the database backend, which you can find in the Redis documentation.

Note

The benchmark test is on the way.

Vector search with filter query#

You can perform Vector Similarity Search based on FLAT or HNSW algorithm and pre-filter results using Redis’ Search Query Syntax.

Consider Documents with embeddings [0, 0, 0] up to [9, 9, 9] where the Document with embedding [i, i, i] has tag price with a number value, tag color with a string value and tag stock with a boolean value. You can create such example with the following code:

import numpy as np
from docarray import Document, DocumentArray

n_dim = 3

da = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'columns': {'price': 'int', 'color': 'str', 'stock': 'int'},
        'distance': 'L2',
    },
)

with da:
    da.extend(
        [
            Document(
                id=f'{i}',
                embedding=i * np.ones(n_dim),
                tags={'price': i, 'color': 'blue', 'stock': int(i % 2 == 0)},
            )
            for i in range(10)
        ]
    )
    da.extend(
        [
            Document(
                id=f'{i+10}',
                embedding=i * np.ones(n_dim),
                tags={'price': i, 'color': 'red', 'stock': int(i % 2 == 0)},
            )
            for i in range(10)
        ]
    )

print('\nIndexed price, color and stock:\n')
for doc in da:
    print(
        f"\tembedding={doc.embedding},\t color={doc.tags['color']},\t stock={doc.tags['stock']}"
    )

Consider the case where you want the nearest vectors to the embedding [8., 8., 8.], with the restriction that prices, colors and stock must pass a filter. For example, let’s consider that retrieved Documents must have a price value lower than or equal to max_price, have color equal to blue and have stock equal to True. We can encode this information in Redis using

@price:[-inf {max_price}] @color:{color} @stock:[1 1]

Then the search with the proposed filter can be used as follows.

Note

For Redis, the distance scores can be accessed in the Document’s .scores dictionary under the key 'score'.

max_price = 7
color = "blue"
n_limit = 5

np_query = np.ones(n_dim) * 8
print(f'\nQuery vector: \t{np_query}')

filter = f'@price:[-inf {max_price}] @color:{color} @stock:[1 1]'

results = da.find(np_query, filter=filter, limit=n_limit)

print(
    '\nEmbeddings Approximate Nearest Neighbours with "price" at most 7, "color" blue and "stock" False:\n'
)
for doc in results:
    print(
        f" score={doc.scores['score'].value},\t embedding={doc.embedding},\t price={doc.tags['price']},\t color={doc.tags['color']},\t stock={doc.tags['stock']}"
    )

This prints:

Embeddings Approximate Nearest Neighbours with "price" at most 7, "color" blue and "stock" True:

 score=12,	 embedding=[6. 6. 6.],	 price=6,	 color=blue,	 stock=1
 score=48,	 embedding=[4. 4. 4.],	 price=4,	 color=blue,	 stock=1
 score=108,	 embedding=[2. 2. 2.],	 price=2,	 color=blue,	 stock=1
 score=192,	 embedding=[0. 0. 0.],	 price=0,	 color=blue,	 stock=1

Note

Note that Redis does not support Boolean types in attributes. Therefore, you need to configure your boolean field as integer in columns configuration ('field': 'int') and use a filter query that treats the field as an integer (@field: [1 1]).

Search by filter query#

You can search with user-defined query filters using the .find method. Such queries follow the Redis Search Query Syntax.

Consider a case where you store Documents with a tag of price into Redis and you want to retrieve all Documents with price less than or equal to some max_price value.

You can index such Documents as follows:

from docarray import Document, DocumentArray

n_dim = 3
da = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'columns': {'price': 'float'},
    },
)

with da:
    da.extend([Document(id=f'r{i}', tags={'price': i}) for i in range(10)])

print('\nIndexed Prices:\n')
for price in da[:, 'tags__price']:
    print(f'\t price={price}')

Then you can retrieve all documents whose price is less than or equal to max_price by applying the following filter:

max_price = 3
n_limit = 4

filter = f'@price:[-inf {max_price}] '
results = da.find(filter=filter)

print('\n Returned examples that verify filter "price at most 3":\n')
for price in results[:, 'tags__price']:
    print(f'\t price={price}')

This would print

 Returned examples that satisfy condition "price at most 3":

  price=0
  price=1
  price=2
  price=3

With Redis as storage backend, you can also do geospatial searches. You can index Documents with a tag of geo type and retrieve all Documents that are within some max_distance from one earth coordinates as follows :

from docarray import Document, DocumentArray

n_dim = 3
da = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'columns': {'location': 'geo'},
    },
)

with da:
    da.extend(
        [
            Document(id=f'r{i}', tags={'location': f"{-98.17+i},{38.71+i}"})
            for i in range(10)
        ]
    )

max_distance = 1000
filter = f'@location:[-98.71 38.71 {max_distance} km] '
results = da.find(filter=filter, limit=n_limit)

Update Vector Search Indexing Schema#

Redis vector similarity supports two indexing methods:

FLAT: Brute-force search.
HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.

Both methods have some mandatory parameters and optional parameters.

Tip

Read more about HNSW or FLAT parameters and their default values here.

You can update the search indexing schema on an existing DocumentArray by setting update_schema to True and changing your configuratoin parameters.

Consider you store Documents with default indexing method 'HNSW' and distance 'L2', and want to find the nearest vectors to the embedding [8. 8. 8.]:

import numpy as np
from docarray import Document, DocumentArray

n_dim = 3

da = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'index_name': 'idx',
        'distance': 'L2',
    },
)

with da:
    da.extend([Document(id=f'{i}', embedding=i * np.ones(n_dim)) for i in range(10)])

np_query = np.ones(n_dim) * 8
n_limit = 5

results = da.find(np_query, limit=n_limit)

print('\nEmbeddings Approximate Nearest Neighbours:\n')
for doc in results:
    print(f" embedding={doc.embedding},\t score={doc.scores['score'].value}")

This prints:

Embeddings Approximate Nearest Neighbours:

 embedding=[8. 8. 8.],   score=0
 embedding=[7. 7. 7.],   score=3
 embedding=[9. 9. 9.],   score=3
 embedding=[6. 6. 6.],   score=12
 embedding=[5. 5. 5.],   score=27

Then you can use a different search indexing schema on the current DocumentArray as follows:

da2 = DocumentArray(
    storage='redis',
    config={
        'n_dim': n_dim,
        'index_name': 'idx',
        'update_schema': True,
        'distance': 'COSINE',
    },
)

results = da.find(np_query, limit=n_limit)

print('\nEmbeddings Approximate Nearest Neighbours:\n')
for doc in results:
    print(f" embedding={doc.embedding},\t score={doc.scores['score'].value}")

This prints:

Embeddings Approximate Nearest Neighbours:

 embedding=[3. 3. 3.],	 score=0
 embedding=[6. 6. 6.],	 score=0
 embedding=[4. 4. 4.],	 score=5.96046447754e-08
 embedding=[1. 1. 1.],	 score=5.96046447754e-08
 embedding=[8. 8. 8.],	 score=5.96046447754e-08

Search by `.text` field#

You can perform full-text search in a DocumentArray with storage='redis'. To do this, text needs to be indexed using the boolean flag 'index_text' which is set when the DocumentArray is created with config={'index_text': True, ...}. The following example builds a DocumentArray with several documents containing text and searches for those that have token1 in their text description.

from docarray import Document, DocumentArray

da = DocumentArray(storage='redis', config={'n_dim': 2, 'index_text': True})
with da:
    da.extend(
        [
            Document(id='1', text='token1 token2 token3'),
            Document(id='2', text='token1 token2'),
            Document(id='3', text='token2 token3 token4'),
        ]
    )

results = da.find('token1')
print(results[:, 'text'])

This prints:

['token1 token2 token3', 'token1 token2']

The default similarity ranking algorithm is BM25. Besides, TFIDF, TFIDF.DOCNORM, DISMAX, DOCSCORE and HAMMING are also supported by RediSearch. You can change it by specifying scorer in function find:

results = da.find('token1 token3', scorer='TFIDF.DOCNORM')
print('scorer=TFIDF.DOCNORM:')
print(results[:, 'text'])

results = da.find('token1 token3')
print('scorer=BM25:')
print(results[:, 'text'])

This prints:

scorer=TFIDF.DOCNORM:
['token1 token2', 'token1 token2 token3', 'token2 token3 token4']
scorer=BM25:
['token1 token2 token3', 'token1 token2', 'token2 token3 token4']

Search by `.tags` field#

Text can also be indexed when it is part of tags. This is mostly useful in applications where text data can be split into groups and applications might require retrieving items based on a text search in an specific tag.

For example:

from docarray import Document, DocumentArray

da = DocumentArray(
    storage='redis',
    config={'n_dim': 32, 'tag_indices': ['food_type', 'price']},
)
with da:
    da.extend(
        [
            Document(
                tags={
                    'food_type': 'Italian and Spanish food',
                    'price': 'cheap but not that cheap',
                },
            ),
            Document(
                tags={
                    'food_type': 'French and Italian food',
                    'price': 'on the expensive side',
                },
            ),
            Document(
                tags={
                    'food_type': 'chinese noddles',
                    'price': 'quite cheap for what you get!',
                },
            ),
        ]
    )

results_cheap = da.find('cheap', index='price')
print('searching "cheap" in <price>:\n\t', results_cheap[:, 'tags__price'])

results_italian = da.find('italian', index='food_type')
print('searching "italian" in <food_type>:\n\t', results_italian[:, 'tags__food_type'])

This prints:

searching "cheap" in <price>:
	 ['cheap but not that cheap', 'quite cheap for what you get!']
searching "italian" in <food_type>:
	 ['Italian and Spanish food', 'French and Italian food']

Note

By default, if you don’t specify the parameter index in the find method, the Document attribute text will be used for search. If you want to use a specific tags field, make sure to specify it with parameter index:

results = da.find('cheap', index='price')

Redis#

Usage#

Start Redis service#

Create DocumentArray with Redis backend#

Configuration#

Vector search with filter query#

Search by filter query#

Update Vector Search Indexing Schema#

Search by .text field#

Search by .tags field#

Search by `.text` field#

Search by `.tags` field#