Redis#
You can use Redis as a document store for DocumentArray. It’s suitable for faster Document retrieval on embeddings, i.e. .match()
, .find()
.
Tip
This feature requires redis
. You can install it via pip install "docarray[redis]".
Usage#
Start Redis service#
To use Redis as the storage backend, it is required to have the Redis service started. Create docker-compose.yml
as follows:
version: "3.3"
services:
redis:
image: redislabs/redisearch:2.6.0
ports:
- "6379:6379"
Then
pip install -U docarray[redis]
docker-compose up
Create DocumentArray with Redis backend#
Assuming the service is started using the default configuration (i.e. server address is localhost:6379
), you can instantiate a DocumentArray with Redis storage as such:
from docarray import DocumentArray
da = DocumentArray(
storage='redis', config={'host': 'localhost', 'port': 6379, 'n_dim': 128}
)
The usage will be the same as the ordinary DocumentArray, but the dimension of an embedding for a Document must be provided at creation time.
To access a previously stored DocumentArray, you can specify index_name
and set host
and port
to match with the previuosly stored DocumentArray.
The following example builds a DocumentArray from previously stored data on localhost:6379
:
from docarray import DocumentArray, Document
with DocumentArray(
storage='redis',
config={
'n_dim': 128,
'index_name': 'idx',
},
) as da:
da.extend([Document() for _ in range(1000)])
da2 = DocumentArray(
storage='redis',
config={
'n_dim': 128,
'index_name': 'idx',
},
)
da2.summary()
Output
╭────────────── Documents Summary ──────────────╮
│ │
│ Type DocumentArrayRedis │
│ Length 1000 │
│ Homogenous Documents True │
│ Common Attributes ('id',) │
│ Multimodal dataclass False │
│ │
╰───────────────────────────────────────────────╯
╭───────────────────── Attributes Summary ─────────────────────╮
│ │
│ Attribute Data type #Unique values Has empty value │
│ ────────────────────────────────────────────────────────── │
│ id ('str',) 1000 False │
│ │
╰──────────────────────────────────────────────────────────────╯
╭─── DocumentArrayRedis Config ───╮
│ │
│ n_dim 128 │
│ host localhost │
│ port 6379 │
│ index_name idx │
│ update_schema True │
│ distance COSINE │
│ redis_config {} │
│ index_text False │
│ tag_indices [] │
│ batch_size 64 │
│ method HNSW │
│ ef_construction 200 │
│ m 16 │
│ ef_runtime 10 │
│ block_size 1048576 │
│ initial_cap None │
│ columns {} │
│ │
╰─────────────────────────────────╯
Other functions behave the same as in-memory DocumentArray.
Configuration#
The following configs can be set:
Name |
Description |
Default |
---|---|---|
|
Host address of the Redis server |
|
|
Port of the Redis Server |
|
|
Other Redis configs in a Dict and pass to |
|
|
Redis index name; the name of RedisSearch index to set this DocumentArray |
|
|
Dimensionality of the embeddings |
|
|
Boolean flag indicating whether to update Redis Search schema |
|
|
Similarity distance metric in Redis, one of { |
|
|
Batch size used to handle storage updates |
|
|
Vector similarity index algorithm in Redis, either |
|
|
Boolean flag indicating whether to index |
|
|
List of tags to index as text field |
|
|
Optional parameter for Redis text search. Refer to the list of supported languages |
|
|
Optional parameter for Redis HNSW algorithm |
|
|
Optional parameter for Redis HNSW algorithm |
|
|
Optional parameter for Redis HNSW algorithm |
|
|
Optional parameter for Redis FLAT algorithm |
|
|
Optional parameter for Redis HNSW and FLAT algorithm |
|
|
Other fields to store in Document and build schema |
|
|
Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance. |
|
|
Boolean flag indicating whether to store |
|
You can check the default values in the docarray source code. For vector search configurations, default values are those of the database backend, which you can find in the Redis documentation.
Note
The benchmark test is on the way.
Vector search with filter query#
You can perform Vector Similarity Search based on FLAT or HNSW algorithm and pre-filter results using Redis’ Search Query Syntax.
Consider Documents with embeddings [0, 0, 0]
up to [9, 9, 9]
where the Document with embedding [i, i, i]
has tag price
with a number value, tag color
with a string value and tag stock
with a boolean value. You can create such example with the following code:
import numpy as np
from docarray import Document, DocumentArray
n_dim = 3
da = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'columns': {'price': 'int', 'color': 'str', 'stock': 'int'},
'distance': 'L2',
},
)
with da:
da.extend(
[
Document(
id=f'{i}',
embedding=i * np.ones(n_dim),
tags={'price': i, 'color': 'blue', 'stock': int(i % 2 == 0)},
)
for i in range(10)
]
)
da.extend(
[
Document(
id=f'{i+10}',
embedding=i * np.ones(n_dim),
tags={'price': i, 'color': 'red', 'stock': int(i % 2 == 0)},
)
for i in range(10)
]
)
print('\nIndexed price, color and stock:\n')
for doc in da:
print(
f"\tembedding={doc.embedding},\t color={doc.tags['color']},\t stock={doc.tags['stock']}"
)
Consider the case where you want the nearest vectors to the embedding [8., 8., 8.]
, with the restriction that prices, colors and stock must pass a filter. For example, let’s consider that retrieved Documents must have a price
value lower than or equal to max_price
, have color
equal to blue
and have stock
equal to True
. We can encode this information in Redis using
@price:[-inf {max_price}] @color:{color} @stock:[1 1]
Then the search with the proposed filter can be used as follows.
Note
For Redis, the distance scores can be accessed in the Document’s .scores
dictionary under the key 'score'
.
max_price = 7
color = "blue"
n_limit = 5
np_query = np.ones(n_dim) * 8
print(f'\nQuery vector: \t{np_query}')
filter = f'@price:[-inf {max_price}] @color:{color} @stock:[1 1]'
results = da.find(np_query, filter=filter, limit=n_limit)
print(
'\nEmbeddings Approximate Nearest Neighbours with "price" at most 7, "color" blue and "stock" False:\n'
)
for doc in results:
print(
f" score={doc.scores['score'].value},\t embedding={doc.embedding},\t price={doc.tags['price']},\t color={doc.tags['color']},\t stock={doc.tags['stock']}"
)
This prints:
Embeddings Approximate Nearest Neighbours with "price" at most 7, "color" blue and "stock" True:
score=12, embedding=[6. 6. 6.], price=6, color=blue, stock=1
score=48, embedding=[4. 4. 4.], price=4, color=blue, stock=1
score=108, embedding=[2. 2. 2.], price=2, color=blue, stock=1
score=192, embedding=[0. 0. 0.], price=0, color=blue, stock=1
Note
Note that Redis does not support Boolean types in attributes. Therefore, you need to configure your boolean field as
integer in columns
configuration ('field': 'int'
) and use a filter query that treats the field as an integer
(@field: [1 1]
).
Search by filter query#
You can search with user-defined query filters using the .find
method. Such queries follow the Redis Search Query Syntax.
Consider a case where you store Documents with a tag of price
into Redis and you want to retrieve all Documents with price
less than or equal to some max_price
value.
You can index such Documents as follows:
from docarray import Document, DocumentArray
n_dim = 3
da = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'columns': {'price': 'float'},
},
)
with da:
da.extend([Document(id=f'r{i}', tags={'price': i}) for i in range(10)])
print('\nIndexed Prices:\n')
for price in da[:, 'tags__price']:
print(f'\t price={price}')
Then you can retrieve all documents whose price is less than or equal to max_price
by applying the following filter:
max_price = 3
n_limit = 4
filter = f'@price:[-inf {max_price}] '
results = da.find(filter=filter)
print('\n Returned examples that verify filter "price at most 3":\n')
for price in results[:, 'tags__price']:
print(f'\t price={price}')
This would print
Returned examples that satisfy condition "price at most 3":
price=0
price=1
price=2
price=3
With Redis as storage backend, you can also do geospatial searches. You can index Documents with a tag of geo
type and retrieve all Documents that are within some max_distance
from one earth coordinates as follows :
from docarray import Document, DocumentArray
n_dim = 3
da = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'columns': {'location': 'geo'},
},
)
with da:
da.extend(
[
Document(id=f'r{i}', tags={'location': f"{-98.17+i},{38.71+i}"})
for i in range(10)
]
)
max_distance = 1000
filter = f'@location:[-98.71 38.71 {max_distance} km] '
results = da.find(filter=filter, limit=n_limit)
Update Vector Search Indexing Schema#
Redis vector similarity supports two indexing methods:
FLAT: Brute-force search.
HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.
Both methods have some mandatory parameters and optional parameters.
Tip
Read more about HNSW or FLAT parameters and their default values here.
You can update the search indexing schema on an existing DocumentArray by setting update_schema
to True
and changing your configuratoin parameters.
Consider you store Documents with default indexing method 'HNSW'
and distance 'L2'
, and want to find the nearest vectors to the embedding [8. 8. 8.]
:
import numpy as np
from docarray import Document, DocumentArray
n_dim = 3
da = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'index_name': 'idx',
'distance': 'L2',
},
)
with da:
da.extend([Document(id=f'{i}', embedding=i * np.ones(n_dim)) for i in range(10)])
np_query = np.ones(n_dim) * 8
n_limit = 5
results = da.find(np_query, limit=n_limit)
print('\nEmbeddings Approximate Nearest Neighbours:\n')
for doc in results:
print(f" embedding={doc.embedding},\t score={doc.scores['score'].value}")
This prints:
Embeddings Approximate Nearest Neighbours:
embedding=[8. 8. 8.], score=0
embedding=[7. 7. 7.], score=3
embedding=[9. 9. 9.], score=3
embedding=[6. 6. 6.], score=12
embedding=[5. 5. 5.], score=27
Then you can use a different search indexing schema on the current DocumentArray as follows:
da2 = DocumentArray(
storage='redis',
config={
'n_dim': n_dim,
'index_name': 'idx',
'update_schema': True,
'distance': 'COSINE',
},
)
results = da.find(np_query, limit=n_limit)
print('\nEmbeddings Approximate Nearest Neighbours:\n')
for doc in results:
print(f" embedding={doc.embedding},\t score={doc.scores['score'].value}")
This prints:
Embeddings Approximate Nearest Neighbours:
embedding=[3. 3. 3.], score=0
embedding=[6. 6. 6.], score=0
embedding=[4. 4. 4.], score=5.96046447754e-08
embedding=[1. 1. 1.], score=5.96046447754e-08
embedding=[8. 8. 8.], score=5.96046447754e-08
Search by .text
field#
You can perform full-text search in a DocumentArray
with storage='redis'
.
To do this, text needs to be indexed using the boolean flag 'index_text'
which is set when the DocumentArray
is created with config={'index_text': True, ...}
.
The following example builds a DocumentArray
with several documents containing text and searches for those that have token1
in their text description.
from docarray import Document, DocumentArray
da = DocumentArray(storage='redis', config={'n_dim': 2, 'index_text': True})
with da:
da.extend(
[
Document(id='1', text='token1 token2 token3'),
Document(id='2', text='token1 token2'),
Document(id='3', text='token2 token3 token4'),
]
)
results = da.find('token1')
print(results[:, 'text'])
This prints:
['token1 token2 token3', 'token1 token2']
The default similarity ranking algorithm is BM25
. Besides, TFIDF
, TFIDF.DOCNORM
, DISMAX
, DOCSCORE
and HAMMING
are also supported by RediSearch. You can change it by specifying scorer
in function find
:
results = da.find('token1 token3', scorer='TFIDF.DOCNORM')
print('scorer=TFIDF.DOCNORM:')
print(results[:, 'text'])
results = da.find('token1 token3')
print('scorer=BM25:')
print(results[:, 'text'])
This prints:
scorer=TFIDF.DOCNORM:
['token1 token2', 'token1 token2 token3', 'token2 token3 token4']
scorer=BM25:
['token1 token2 token3', 'token1 token2', 'token2 token3 token4']