Query by Conditions#

You can use find() to select Documents from a DocumentArray based the conditions specified in a query object. You can use da.find(query) to filter Documents and get nearest neighbors from da:

  • To filter Documents, the query object is a Python dictionary object that defines the filtering conditions using a MongoDB-like query language.

  • To find nearest neighbors, the query object needs to be a NdArray-like, a Document, or a DocumentArray object that defines embedding. You can also use .match() function for this purpose, and there is a minor interface difference between these two functions, which is described in the next chapter.

filter query syntax

The filter query syntax depends on which document store you use. Some may have their own query language.

Let’s see some examples in action. First, let’s prepare a DocumentArray:

from jina import Document, DocumentArray

da = DocumentArray(
    [
        Document(
            text='journal',
            weight=25,
            tags={'h': 14, 'w': 21, 'uom': 'cm'},
            modality='A',
        ),
        Document(
            text='notebook',
            weight=50,
            tags={'h': 8.5, 'w': 11, 'uom': 'in'},
            modality='A',
        ),
        Document(
            text='paper',
            weight=100,
            tags={'h': 8.5, 'w': 11, 'uom': 'in'},
            modality='D',
        ),
        Document(
            text='planner',
            weight=75,
            tags={'h': 22.85, 'w': 30, 'uom': 'cm'},
            modality='D',
        ),
        Document(
            text='postcard',
            weight=45,
            tags={'h': 10, 'w': 15.25, 'uom': 'cm'},
            modality='A',
        ),
    ]
)

da.summary()
                            Documents Summary                            
                                                                         
  Length                 5                                               
  Homogenous Documents   True                                            
  Common Attributes      ('id', 'text', 'tags', 'weight', 'modality')  
                                                                         
                     Attributes Summary                     
                                                            
  Attribute   Data type   #Unique values   Has empty value  
 ────────────────────────────────────────────────────────── 
  id          ('str',)    5                False            
  weight      ('int',)    5                False            
  modality    ('str',)    2                False            
  tags        ('dict',)   5                False            
  text        ('str',)    5                False            

Filter with query operators#

A query filter document uses query operators to specify conditions:

{ <field1>: { <operator1>: <value1> }, ... }

Here field1 is any field name of a Document object. To access nested fields, you can use the dunder expression. For example, tags__timestamp accesses the doc.tags['timestamp'] field.

value1 can be either a user given Python object, or a substitution field with curly bracket {field}

Finally, operator1 can be one of the following:

Query Operator

Description

$eq

Equal to (number, string)

$ne

Not equal to (number, string)

$gt

Greater than (number)

$gte

Greater than or equal to (number)

$lt

Less than (number)

$lte

Less than or equal to (number)

$in

Is in an array

$nin

Not in an array

$regex

Match the specified regular expression

$size

Match array/dict field that have the specified size. $size does not accept ranges of values.

$exists

Matches documents that have the specified field; predefined fields having a default value (for example empty string, or 0) are considered as not existing; if the expression specifies a field x in tags (tags__x), then the operator tests that x is not None.

To select all modality='D' Documents:

r = da.find({'modality': {'$eq': 'D'}})

pprint(r.to_dict(exclude_none=True))  # just for pretty print
[
  {
    "id": "92aee5d665d0c4dd34db10d83642aded",
    "modality": "D",
    "tags": {
      "h": 8.5,
      "uom": "in",
      "w": 11.0
    },
    "text": "paper",
    "weight": 100.0
  },
  {
    "id": "1a9d2139b02bc1c7842ecda94b347889",
    "modality": "D",
    "tags": {
      "h": 22.85,
      "uom": "cm",
      "w": 30.0
    },
    "text": "planner",
    "weight": 75.0
  }
]

To select all Documents whose .tags['h']>10,

r = da.find({'tags__h': {'$gt': 10}})
[
  {
    "id": "4045a9659875fd1299e482d710753de3",
    "modality": "A",
    "tags": {
      "h": 14.0,
      "uom": "cm",
      "w": 21.0
    },
    "text": "journal",
    "weight": 25.0
  },
  {
    "id": "cf7691c445220b94b88ff116911bad24",
    "modality": "D",
    "tags": {
      "h": 22.85,
      "uom": "cm",
      "w": 30.0
    },
    "text": "planner",
    "weight": 75.0
  }
]

Beside using a predefined value, you can also use a substitution with {field}. Notice those curly braces. For example:

r = da.find({'tags__h': {'$gt': '{tags__w}'}})
[
  {
    "id": "44c6a4b18eaa005c6dbe15a28a32ebce",
    "modality": "A",
    "tags": {
      "h": 14.0,
      "uom": "cm",
      "w": 10.0
    },
    "text": "journal",
    "weight": 25.0
  }
]

Combine multiple conditions#

You can combine multiple conditions using the following operators:

Boolean Operator

Description

$and

Join query clauses with a logical AND

$or

Join query clauses with a logical OR

$not

Inverts the effect of a query expression

r = da.find({'$or': [{'weight': {'$eq': 45}}, {'modality': {'$eq': 'D'}}]})
[
  {
    "id": "22985b71b6d483c31cbe507ed4d02bd1",
    "modality": "D",
    "tags": {
      "h": 8.5,
      "uom": "in",
      "w": 11.0
    },
    "text": "paper",
    "weight": 100.0
  },
  {
    "id": "a071faf19feac5809642e3afcd3a5878",
    "modality": "D",
    "tags": {
      "h": 22.85,
      "uom": "cm",
      "w": 30.0
    },
    "text": "planner",
    "weight": 75.0
  },
  {
    "id": "411ecc70a71a3f00fc3259bf08c239d1",
    "modality": "A",
    "tags": {
      "h": 10.0,
      "uom": "cm",
      "w": 15.25
    },
    "text": "postcard",
    "weight": 45.0
  }
]