Support New Modality#
Each type in docarray.typing corresponds to one modality. Supporting a new modality means adding a new type, and specifying how it is translated to/from Document.
Whether youโre adding a new type or changing the behavior of an existing type, you can leverage the field() function.
Create a new type#
Say you want to define a new type MyImage, where image is accepted as a URI. However, instead of loading it to .tensor of the sub-document, you want to load it to .blob. This is different from the built-in Image type behavior.
All you need to do is:
from docarray import Document
from typing import TypeVar
MyImage = TypeVar('MyImage', bound=str)
def my_setter(value) -> 'Document':
return Document(uri=value).load_uri_to_blob()
def my_getter(doc: 'Document'):
return doc.uri
Now you can use MyImage type in the dataclass:
from docarray import dataclass, field, Document
@dataclass
class MMDoc:
banner: MyImage = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')
Document(MMDoc()).summary()
๐ Document: bde1ab74306c2f63188069879e3945ac
โโโ ๐ Chunks
โโโ ๐ Document: cd594a6870a8921d7a9c6b0ec764251d
โญโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Attribute โ Value โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ parent_id โ bde1ab74306c2f63188069879e3945ac โ
โ granularity โ 1 โ
โ blob โ b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x0โฆ โ
โ โ (length: 56810) โ
โ mime_type โ image/jpeg โ
โ uri โ test-1.jpeg โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Specifically, setter defines how you want to store the value in the sub-Document. Usually you need to process it and store the value in one of the attributes defined by the Document schema. You may also want to keep the original value so that you can recover it in getter later. setter is invoked when calling Document() on this dataclass.
getter defines how you want to recover the original value from the sub-Document. getter is invoked when calling the dataclass constructor given a Document object.
Override existing types#
To override the getter and setter behaviors of existing types, define a map and pass it to the argument of type_var_map in the dataclass() function:
from docarray import dataclass, field, Document
from docarray.typing import Image
def my_setter(value) -> 'Document':
print('im setting .uri only not loading it!')
return Document(uri=value)
def my_getter(doc: 'Document'):
print('im returning .uri!')
return doc.uri
@dataclass(
type_var_map={
Image: lambda x: field(setter=my_setter, getter=my_getter, _source_field=x)
}
)
class MMDoc:
banner: Image = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')
m1 = MMDoc()
m2 = MMDoc(Document(m1))
assert m1 == m2
im setting .uri only not loading it!
im returning .uri!