Multi-modal#

This example walks you through how to use DocArray to process multiple data modalities in tandem. To do this comfortably and cleanly, you can use DocArrayโ€™s dataclass feature.

See also

This example works with image and text data. If you are not yet familiar with how to process these modalities individually, you may want to check out the respective examples first: Image and Text

Model your data#

If you work with multiple modalities at the same time, most likely they stand in some relation with each other. DocArrayโ€™s dataclass feature allows you to model your data and these relationships, using the language of your domain.

Suppose you want to model a page of a newspaper that contains a main text, an image, and an image description. You can model this example in the following way:

from docarray import dataclass
from docarray.typing import Image, Text


@dataclass
class Page:
    main_text: Text
    image: Image
    description: Text

Instantiate a Document#

After defining the data model through dataclasses, you can instantiate the dataclasses with your actual data, and cast it to a Document:

from docarray import Document

page = Page(
    main_text='Hello world',
    image='apple.png',
    description='This is the image of an apple',
)

doc = Document(page)

Finally, you can see the nested Document structure that was created automatically:

doc.summary()
Output
๐Ÿ“„ Document: 7f03e397da8725aa8a2aed4a0d64f263
โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”œโ”€โ”€ ๐Ÿ“„ Document: 627c3b052b86e908b10475a4649ce49b
    โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ Attribute            โ”‚ Value                                          
    โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ parent_id            โ”‚ 7f03e397da8725aa8a2aed4a0d64f263               
    โ”‚   โ”‚ granularity          โ”‚ 1                                              
    โ”‚   โ”‚ text                 โ”‚ Hello world                                    
    โ”‚   โ”‚ modality             โ”‚ text                                           
    โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”œโ”€โ”€ ๐Ÿ“„ Document: 79e75c074aa444341baac18549930450
    โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ Attribute    โ”‚ Value                                                  
    โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ parent_id    โ”‚ 7f03e397da8725aa8a2aed4a0d64f263                       
    โ”‚   โ”‚ granularity  โ”‚ 1                                                      
    โ”‚   โ”‚ tensor       โ”‚ <class 'numpy.ndarray'> in shape (618, 641, 3), dtype: 
    โ”‚   โ”‚ mime_type    โ”‚ image/png                                              
    โ”‚   โ”‚ uri          โ”‚ apple.png                                              
    โ”‚   โ”‚ modality     โ”‚ image                                                  
    โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ””โ”€โ”€ ๐Ÿ“„ Document: 6861a1e3d77c3560a630dee34ba5ac7f
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        โ”‚ Attribute            โ”‚ Value                                          
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        โ”‚ parent_id            โ”‚ 7f03e397da8725aa8a2aed4a0d64f263               
        โ”‚ granularity          โ”‚ 1                                              
        โ”‚ text                 โ”‚ This is the image of an apple                  
        โ”‚ modality             โ”‚ text                                           
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Nested dataclasses and list types#

If your domain requires a more complex model, you can use advanced features to represent that accurately.

For this example, we look at a journal which consists of a cover page and multiple other pages, as well as some metadata. Further, each page contains a main text, and can contain and image and an image description.

from docarray import dataclass
from docarray.typing import Image, Text, JSON
from typing import List


@dataclass
class Page:
    main_text: Text
    image: Image = None
    description: Text = None


@dataclass
class Journal:
    cover: Page
    pages: List[Page]
    metadata: JSON

You can instantiate this complex Document in the same way as before:

from docarray import Document

pages = [
    Page(
        main_text='Hello world',
        image='apple.png',
        description='This is the image of an apple',
    ),
    Page(main_text='Second page'),
]

journal = Journal(
    cover=Page(main_text='DocArray Daily', image='apple.png'),
    pages=pages,
    metadata={'author': 'Jina AI', 'issue': '1'},
)

doc = Document(journal)
doc.summary()
Output
๐Ÿ“„ Document: cab4e047bc84ffb6b8b0597ff4ee0e9f
โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”œโ”€โ”€ ๐Ÿ“„ Document: ea686d21029e4687df83a6ee31af98b2
    โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ Attribute            โ”‚ Value                                          
    โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ parent_id            โ”‚ cab4e047bc84ffb6b8b0597ff4ee0e9f               
    โ”‚   โ”‚ granularity          โ”‚ 1                                              
    โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”‚       โ”œโ”€โ”€ ๐Ÿ“„ Document: 139a5f16ab176b5c9d5088b1f2792973
    โ”‚       โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚   โ”‚ Attribute            โ”‚ Value                                  
    โ”‚       โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚   โ”‚ parent_id            โ”‚ ea686d21029e4687df83a6ee31af98b2       
    โ”‚       โ”‚   โ”‚ granularity          โ”‚ 1                                      
    โ”‚       โ”‚   โ”‚ text                 โ”‚ DocArray Daily                         
    โ”‚       โ”‚   โ”‚ modality             โ”‚ text                                   
    โ”‚       โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ””โ”€โ”€ ๐Ÿ“„ Document: f1e7527757c7dc6006fa8fa36e7b788f
    โ”‚           โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚           โ”‚ Attribute    โ”‚ Value                                          
    โ”‚           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚           โ”‚ parent_id    โ”‚ ea686d21029e4687df83a6ee31af98b2               
    โ”‚           โ”‚ granularity  โ”‚ 1                                              
    โ”‚           โ”‚ tensor       โ”‚ <class 'numpy.ndarray'> in shape (618, 641, 3),
    โ”‚           โ”‚ mime_type    โ”‚ image/png                                      
    โ”‚           โ”‚ uri          โ”‚ apple.png                                      
    โ”‚           โ”‚ modality     โ”‚ image                                          
    โ”‚           โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”œโ”€โ”€ ๐Ÿ“„ Document: 2a13aee3a2ac8eadc07f43bc2dd83583
    โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ Attribute            โ”‚ Value                                          
    โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ”‚ parent_id            โ”‚ cab4e047bc84ffb6b8b0597ff4ee0e9f               
    โ”‚   โ”‚ granularity          โ”‚ 1                                              
    โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚   โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”‚       โ”œโ”€โ”€ ๐Ÿ“„ Document: b6bcfa7000a25bd84ddcd35813c99b4c
    โ”‚       โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚   โ”‚ Attribute            โ”‚ Value                                  
    โ”‚       โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚   โ”‚ parent_id            โ”‚ 2a13aee3a2ac8eadc07f43bc2dd83583       
    โ”‚       โ”‚   โ”‚ granularity          โ”‚ 1                                      
    โ”‚       โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚   โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”‚       โ”‚       โ”œโ”€โ”€ ๐Ÿ“„ Document: 71018fd73c13187309590e82b5255416
    โ”‚       โ”‚       โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ”‚   โ”‚ Attribute            โ”‚ Value                          
    โ”‚       โ”‚       โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ”‚   โ”‚ parent_id            โ”‚ b6bcfa7000a25bd84ddcd35813c99b4
    โ”‚       โ”‚       โ”‚   โ”‚ granularity          โ”‚ 1                              
    โ”‚       โ”‚       โ”‚   โ”‚ text                 โ”‚ Hello world                    
    โ”‚       โ”‚       โ”‚   โ”‚ modality             โ”‚ text                           
    โ”‚       โ”‚       โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ”œโ”€โ”€ ๐Ÿ“„ Document: b335f748006204dd27bb5fa9a99a572f
    โ”‚       โ”‚       โ”‚   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ”‚   โ”‚ Attribute    โ”‚ Value                                  
    โ”‚       โ”‚       โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ”‚   โ”‚ parent_id    โ”‚ b6bcfa7000a25bd84ddcd35813c99b4c       
    โ”‚       โ”‚       โ”‚   โ”‚ granularity  โ”‚ 1                                      
    โ”‚       โ”‚       โ”‚   โ”‚ tensor       โ”‚ <class 'numpy.ndarray'> in shape (618, 
    โ”‚       โ”‚       โ”‚   โ”‚ mime_type    โ”‚ image/png                              
    โ”‚       โ”‚       โ”‚   โ”‚ uri          โ”‚ apple.png                              
    โ”‚       โ”‚       โ”‚   โ”‚ modality     โ”‚ image                                  
    โ”‚       โ”‚       โ”‚   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚       โ””โ”€โ”€ ๐Ÿ“„ Document: 7769657ae7c25227920b5ae35a2a3c31
    โ”‚       โ”‚           โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚           โ”‚ Attribute            โ”‚ Value                          
    โ”‚       โ”‚           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ”‚           โ”‚ parent_id            โ”‚ b6bcfa7000a25bd84ddcd35813c99b4
    โ”‚       โ”‚           โ”‚ granularity          โ”‚ 1                              
    โ”‚       โ”‚           โ”‚ text                 โ”‚ This is the image of an apple  
    โ”‚       โ”‚           โ”‚ modality             โ”‚ text                           
    โ”‚       โ”‚           โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚       โ””โ”€โ”€ ๐Ÿ“„ Document: 29f1835bac77e435f00976c5cf4e97cb
    โ”‚           โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚           โ”‚ Attribute            โ”‚ Value                                  
    โ”‚           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚           โ”‚ parent_id            โ”‚ 2a13aee3a2ac8eadc07f43bc2dd83583       
    โ”‚           โ”‚ granularity          โ”‚ 1                                      
    โ”‚           โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚           โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ”‚               โ””โ”€โ”€ ๐Ÿ“„ Document: bc8adb52bad51ccff3d6e7834a4b536a
    โ”‚                   โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚                   โ”‚ Attribute            โ”‚ Value                          
    โ”‚                   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ”‚                   โ”‚ parent_id            โ”‚ 29f1835bac77e435f00976c5cf4e97c
    โ”‚                   โ”‚ granularity          โ”‚ 1                              
    โ”‚                   โ”‚ text                 โ”‚ Second page                    
    โ”‚                   โ”‚ modality             โ”‚ text                           
    โ”‚                   โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    โ””โ”€โ”€ ๐Ÿ“„ Document: c602af33ed3f2d693a5633e53b87e19c
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        โ”‚ Attribute           โ”‚ Value                                           
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
        โ”‚ parent_id           โ”‚ cab4e047bc84ffb6b8b0597ff4ee0e9f                
        โ”‚ granularity         โ”‚ 1                                               
        โ”‚ tags                โ”‚ {'author': 'Jina AI', 'issue': '1'}             
        โ”‚ modality            โ”‚ json                                            
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Access the data#

After instantiation, each modality can be accessed directly from the Document:

from docarray import dataclass, Document
from docarray.typing import Image, Text


@dataclass
class Page:
    main_text: Text
    image: Image
    description: Text


page = Page(
    main_text='Hello world',
    image='apple.png',
    description='This is the image of an apple',
)

doc = Document(page)

print(doc.main_text)
print(doc.main_text.text)
print(doc.image)
print(doc.image.tensor)
<Document ('id', 'parent_id', 'granularity', 'text', 'modality') at 1ee83d2c391f078736732bb34a021587>
Hello world
<Document ('id', 'parent_id', 'granularity', 'tensor', 'mime_type', 'uri', '_metadata', 'modality') at c8fe3b8fd101bea6a4820a53d2993bdf>
[[[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]]

Generate embeddings#

Common use cases, such as neural search, involve generating embeddings for your data.

There are two ways of doing this, each of which has its use cases: Generating individually embeddings for each modality, and generating an overall embedding for the entire Document.

Embed each modality#

If you want to create an embedding for each modality of each page, you can simply access the corresponding Document, and add an embedding vector.

This can be useful, for example, when you want to compare different Documents based on a specific modality that they store.

from torchvision.models import resnet50

img_model = resnet50(pretrained=True)

# embed textual data
doc.main_text.embed_feature_hashing()
doc.description.embed_feature_hashing()
# embed image data
doc.image.set_image_tensor_shape(shape=(224, 224)).set_image_tensor_channel_axis(
    original_channel_axis=-1, new_channel_axis=0
).set_image_tensor_normalization(channel_axis=0).embed(img_model)

print(doc.main_text.embedding.shape)
print(doc.description.embedding.shape)
print(doc.image.embedding.shape)
(256,)
(256,)
torch.Size([1000])

If you have a DocumentArray of multi-modal Documents, you can embed the modalities of each Document in the following way:

from docarray import DocumentArray, Document

da = DocumentArray(
    [
        Document(
            Page(
                main_text='First page',
                image='apple.png',
                description='This is the image of an apple',
            )
        ),
        Document(
            Page(
                main_text='Second page',
                image='apple.png',
                description='Still the same image of the same apple',
            )
        ),
    ]
)

from torchvision.models import resnet50

img_model = resnet50(pretrained=True)

# embed textual data
da['@.[description, main_text]'].apply(lambda d: d.embed_feature_hashing())
# embed image data
da['@.[image]'].apply(
    lambda d: d.set_image_tensor_shape(shape=(224, 224))
    .set_image_tensor_channel_axis(original_channel_axis=-1, new_channel_axis=0)
    .set_image_tensor_normalization(channel_axis=0)
)
da['@.[image]'].embed(img_model)

print(da['@.[description, main_text]'].embeddings.shape)
print(da['@.[image]'].embeddings.shape)
(4, 256)
torch.Size([2, 1000])

Embed parent Document#

From the individual embeddings you can create a combined embedding for the entire Document. This can be useful, for example, when you want to compare different Documents based on all the modalities that they store.

import numpy as np


def combine_embeddings(d):
    # any (more sophisticated) function could go here
    d.embedding = np.concatenate(
        [d.image.embedding, d.main_text.embedding, d.description.embedding]
    )
    return d


da.apply(combine_embeddings)
print(da.embeddings.shape)
(2, 1512)