Support New Modality#
Each type in docarray.typing
corresponds to one modality. Supporting a new modality means adding a new type, and specifying how it is translated to/from Document.
Whether youโre adding a new type or changing the behavior of an existing type, you can leverage the field()
function.
Create a new type#
Say you want to define a new type MyImage
, where image is accepted as a URI. However, instead of loading it to .tensor
of the sub-document, you want to load it to .blob
. This is different from the built-in Image
type behavior.
All you need to do is:
from docarray import Document
from typing import TypeVar
MyImage = TypeVar('MyImage', bound=str)
def my_setter(value) -> 'Document':
return Document(uri=value).load_uri_to_blob()
def my_getter(doc: 'Document'):
return doc.uri
Now you can use MyImage
type in the dataclass:
from docarray import dataclass, field, Document
@dataclass
class MMDoc:
banner: MyImage = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')
Document(MMDoc()).summary()
๐ Document: bde1ab74306c2f63188069879e3945ac
โโโ ๐ Chunks
โโโ ๐ Document: cd594a6870a8921d7a9c6b0ec764251d
โญโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Attribute โ Value โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ parent_id โ bde1ab74306c2f63188069879e3945ac โ
โ granularity โ 1 โ
โ blob โ b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x0โฆ โ
โ โ (length: 56810) โ
โ mime_type โ image/jpeg โ
โ uri โ test-1.jpeg โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Specifically, setter
defines how you want to store the value in the sub-Document. Usually you need to process it and store the value in one of the attributes defined by the Document schema. You may also want to keep the original value so that you can recover it in getter
later. setter
is invoked when calling Document()
on this dataclass.
getter
defines how you want to recover the original value from the sub-Document. getter
is invoked when calling the dataclass constructor given a Document object.
Override existing types#
To override the getter
and setter
behaviors of existing types, define a map and pass it to the argument of type_var_map
in the dataclass()
function:
from docarray import dataclass, field, Document
from docarray.typing import Image
def my_setter(value) -> 'Document':
print('im setting .uri only not loading it!')
return Document(uri=value)
def my_getter(doc: 'Document'):
print('im returning .uri!')
return doc.uri
@dataclass(
type_var_map={
Image: lambda x: field(setter=my_setter, getter=my_getter, _source_field=x)
}
)
class MMDoc:
banner: Image = field(setter=my_setter, getter=my_getter, default='test-1.jpeg')
m1 = MMDoc()
m2 = MMDoc(Document(m1))
assert m1 == m2
im setting .uri only not loading it!
im returning .uri!