Query by Conditions#
You can use find()
to select Documents from a DocumentArray based the
conditions specified in a query
object. You can use da.find(query)
to filter Documents and get nearest neighbors
from da
:
To filter Documents, the
query
object is a Python dictionary object that defines the filtering conditions using a MongoDB-like query language.To find nearest neighbors, the
query
object needs to be a NdArray-like, a Document, or a DocumentArray object that defines embedding. You can also use.match()
function for this purpose, and there is a minor interface difference between these two functions, which is described in the next chapter.
filter query syntax
The filter query syntax depends on which document store you use. Some may have their own query language.
Let’s see some examples in action. First, let’s prepare a DocumentArray:
from jina import Document, DocumentArray
da = DocumentArray(
[
Document(
text='journal',
weight=25,
tags={'h': 14, 'w': 21, 'uom': 'cm'},
modality='A',
),
Document(
text='notebook',
weight=50,
tags={'h': 8.5, 'w': 11, 'uom': 'in'},
modality='A',
),
Document(
text='paper',
weight=100,
tags={'h': 8.5, 'w': 11, 'uom': 'in'},
modality='D',
),
Document(
text='planner',
weight=75,
tags={'h': 22.85, 'w': 30, 'uom': 'cm'},
modality='D',
),
Document(
text='postcard',
weight=45,
tags={'h': 10, 'w': 15.25, 'uom': 'cm'},
modality='A',
),
]
)
da.summary()
Documents Summary
Length 5
Homogenous Documents True
Common Attributes ('id', 'text', 'tags', 'weight', 'modality')
Attributes Summary
Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 5 False
weight ('int',) 5 False
modality ('str',) 2 False
tags ('dict',) 5 False
text ('str',) 5 False
Filter with query operators#
A query filter document uses query operators to specify conditions:
{ <field1>: { <operator1>: <value1> }, ... }
Here field1
is any field name of a Document object. To access nested fields, you can use the dunder expression.
For example, tags__timestamp
accesses the doc.tags['timestamp']
field.
value1
can be either a user given Python object, or a substitution field with curly bracket {field}
Finally, operator1
can be one of the following:
Query Operator |
Description |
---|---|
|
Equal to (number, string) |
|
Not equal to (number, string) |
|
Greater than (number) |
|
Greater than or equal to (number) |
|
Less than (number) |
|
Less than or equal to (number) |
|
Is in an array |
|
Not in an array |
|
Match the specified regular expression |
|
Match array/dict field that have the specified size. |
|
Matches documents that have the specified field; predefined fields having a default value (for example empty string, or 0) are considered as not existing; if the expression specifies a field |
To select all modality='D'
Documents:
r = da.find({'modality': {'$eq': 'D'}})
pprint(r.to_dict(exclude_none=True)) # just for pretty print
[
{
"id": "92aee5d665d0c4dd34db10d83642aded",
"modality": "D",
"tags": {
"h": 8.5,
"uom": "in",
"w": 11.0
},
"text": "paper",
"weight": 100.0
},
{
"id": "1a9d2139b02bc1c7842ecda94b347889",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
}
]
To select all Documents whose .tags['h']>10
,
r = da.find({'tags__h': {'$gt': 10}})
[
{
"id": "4045a9659875fd1299e482d710753de3",
"modality": "A",
"tags": {
"h": 14.0,
"uom": "cm",
"w": 21.0
},
"text": "journal",
"weight": 25.0
},
{
"id": "cf7691c445220b94b88ff116911bad24",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
}
]
Beside using a predefined value, you can also use a substitution with {field}
. Notice those curly braces. For example:
r = da.find({'tags__h': {'$gt': '{tags__w}'}})
[
{
"id": "44c6a4b18eaa005c6dbe15a28a32ebce",
"modality": "A",
"tags": {
"h": 14.0,
"uom": "cm",
"w": 10.0
},
"text": "journal",
"weight": 25.0
}
]
Combine multiple conditions#
You can combine multiple conditions using the following operators:
Boolean Operator |
Description |
---|---|
|
Join query clauses with a logical AND |
|
Join query clauses with a logical OR |
|
Inverts the effect of a query expression |
r = da.find({'$or': [{'weight': {'$eq': 45}}, {'modality': {'$eq': 'D'}}]})
[
{
"id": "22985b71b6d483c31cbe507ed4d02bd1",
"modality": "D",
"tags": {
"h": 8.5,
"uom": "in",
"w": 11.0
},
"text": "paper",
"weight": 100.0
},
{
"id": "a071faf19feac5809642e3afcd3a5878",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
},
{
"id": "411ecc70a71a3f00fc3259bf08c239d1",
"modality": "A",
"tags": {
"h": 10.0,
"uom": "cm",
"w": 15.25
},
"text": "postcard",
"weight": 45.0
}
]