Support New Modality#

Each type in docarray.typing corresponds to one modality. Supporting a new modality means adding a new type, and specifying how it is translated to/from Document.

Whether youโ€™re adding a new type or changing the behavior of an existing type, you can leverage the field() function.

Create a new type#

Say you want to define a new type MyImage, where image is accepted as a URI. However, instead of loading it to .tensor of the sub-document, you want to load it to .blob. This is different from the built-in Image type behavior.

All you need to do is:

from docarray import Document

from typing import TypeVar

MyImage = TypeVar('MyImage', bound=str)


def my_setter(value) -> 'Document':
    return Document(uri=value).load_uri_to_blob()


def my_getter(doc: 'Document'):
    return doc.uri

Now you can use MyImage type in the dataclass:

from docarray import dataclass, field, Document


@dataclass
class MMDoc:
    banner: MyImage = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')


Document(MMDoc()).summary()
๐Ÿ“„ Document: bde1ab74306c2f63188069879e3945ac
โ””โ”€โ”€ ๐Ÿ’  Chunks
    โ””โ”€โ”€ ๐Ÿ“„ Document: cd594a6870a8921d7a9c6b0ec764251d
        โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
        โ”‚ Attribute   โ”‚ Value                                                          โ”‚
        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
        โ”‚ parent_id   โ”‚ bde1ab74306c2f63188069879e3945ac                               โ”‚
        โ”‚ granularity โ”‚ 1                                                              โ”‚
        โ”‚ blob        โ”‚ b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x0โ€ฆ โ”‚
        โ”‚             โ”‚ (length: 56810)                                                โ”‚
        โ”‚ mime_type   โ”‚ image/jpeg                                                     โ”‚
        โ”‚ uri         โ”‚ test-1.jpeg                                                    โ”‚
        โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Specifically, setter defines how you want to store the value in the sub-Document. Usually you need to process it and store the value in one of the attributes defined by the Document schema. You may also want to keep the original value so that you can recover it in getter later. setter is invoked when calling Document() on this dataclass.

getter defines how you want to recover the original value from the sub-Document. getter is invoked when calling the dataclass constructor given a Document object.

Override existing types#

To override the getter and setter behaviors of existing types, define a map and pass it to the argument of type_var_map in the dataclass() function:

from docarray import dataclass, field, Document
from docarray.typing import Image


def my_setter(value) -> 'Document':
    print('im setting .uri only not loading it!')
    return Document(uri=value)


def my_getter(doc: 'Document'):
    print('im returning .uri!')
    return doc.uri


@dataclass(
    type_var_map={
        Image: lambda x: field(setter=my_setter, getter=my_getter, _source_field=x)
    }
)
class MMDoc:
    banner: Image = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')


m1 = MMDoc()
m2 = MMDoc(Document(m1))

assert m1 == m2
im setting .uri only not loading it!
im returning .uri!