Protocol Documentation#

Table of Contents#

Top

docarray.proto#

DenseNdArrayProto#

Represents a (quantized) dense n-dim array

Field

Type

Label

Description

buffer

bytes

the actual array data, in bytes

shape

uint32

repeated

the shape (dimensions) of the array

dtype

string

the data type of the array

DocumentArrayProto#

Field

Type

Label

Description

docs

DocumentProto

repeated

a list of Documents

DocumentProto#

Represents a Document

Field

Type

Label

Description

id

string

A hexdigest that represents a unique document ID

blob

bytes

the raw binary content of this document, which often represents the original document when comes into jina

tensor

NdArrayProto

the ndarray of the image/audio/video document

text

string

a text document

granularity

uint32

the depth of the recursive chunk structure

adjacency

uint32

the width of the recursive match structure

parent_id

string

the parent id from the previous granularity

weight

float

The weight of this document

uri

string

a uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme

modality

string

modality, an identifier to the modality this document belongs to. In the scope of multi/cross modal search

mime_type

string

mime type of this document, for buffer content, this is required; for other contents, this can be guessed

offset

float

the offset of the doc

location

float

repeated

the position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be timestamp of an audio clip

chunks

DocumentProto

repeated

list of the sub-documents of this document (recursive structure)

matches

DocumentProto

repeated

the matched documents on the same level (recursive structure)

embedding

NdArrayProto

the embedding of this document

tags

google.protobuf.Struct

a structured data value, consisting of field which map to dynamically typed values.

scores

DocumentProto.ScoresEntry

repeated

Scores performed on the document, each element corresponds to a metric

evaluations

DocumentProto.EvaluationsEntry

repeated

Evaluations performed on the document, each element corresponds to a metric

_metadata

google.protobuf.Struct

system-defined meta attributes represented in a structured data value.

DocumentProto.EvaluationsEntry#

Field

Type

Label

Description

key

string

value

NamedScoreProto

DocumentProto.ScoresEntry#

Field

Type

Label

Description

key

string

value

NamedScoreProto

NamedScoreProto#

Represents the relevance model to ref_id

Field

Type

Label

Description

value

float

value

op_name

string

the name of the operator/score function

description

string

text description of the score

ref_id

string

the score is computed between doc id and ref_id

NdArrayProto#

Represents a general n-dim array, can be either dense or sparse

Field

Type

Label

Description

dense

DenseNdArrayProto

dense representation of the ndarray

sparse

SparseNdArrayProto

sparse representation of the ndarray

cls_name

string

the name of the ndarray class

parameters

google.protobuf.Struct

SparseNdArrayProto#

Represents a sparse ndarray

Field

Type

Label

Description

indices

DenseNdArrayProto

A 2-D int64 tensor of shape [N, ndims], which specifies the indices of the elements in the sparse tensor that contain nonzero values (elements are zero-indexed)

values

DenseNdArrayProto

A 1-D tensor of any type and shape [N], which supplies the values for each element in indices.

shape

uint32

repeated

A 1-D int64 tensor of shape [ndims], which specifies the shape of the sparse tensor.

Scalar Value Types#

.proto Type

Notes

C++

Java

Python

Go

C#

PHP

Ruby

double

double

double

float

float64

double

float

Float

float

float

float

float

float32

float

float

Float

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

int64

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

int64

long

int/long

int64

long

integer/string

Bignum

uint32

Uses variable-length encoding.

uint32

int

int/long

uint32

uint

integer

Bignum or Fixnum (as required)

uint64

Uses variable-length encoding.

uint64

long

int/long

uint64

ulong

integer/string

Bignum or Fixnum (as required)

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

sint64

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

int64

long

int/long

int64

long

integer/string

Bignum

fixed32

Always four bytes. More efficient than uint32 if values are often greater than 2^28.

uint32

int

int

uint32

uint

integer

Bignum or Fixnum (as required)

fixed64

Always eight bytes. More efficient than uint64 if values are often greater than 2^56.

uint64

long

int/long

uint64

ulong

integer/string

Bignum

sfixed32

Always four bytes.

int32

int

int

int32

int

integer

Bignum or Fixnum (as required)

sfixed64

Always eight bytes.

int64

long

int/long

int64

long

integer/string

Bignum

bool

bool

boolean

boolean

bool

bool

boolean

TrueClass/FalseClass

string

A string must always contain UTF-8 encoded or 7-bit ASCII text.

string

String

str/unicode

string

string

string

String (UTF-8)

bytes

May contain any arbitrary sequence of bytes.

string

ByteString

str

[]byte

ByteString

string

String (ASCII-8BIT)

````{tab} 😔 Don't
```{code-block} python
---
emphasize-lines: 9, 10
---
animals = ['cat', 'dog', 'fish']

da = DocumentArray([Document(id=i) for i in range(3)])

da.save_binary('aux.bin', protocol='protobuf')
da_loaded = DocumentArray.load_binary('aux.bin', protocol='protobuf')

for doc in da_loaded:
    index = da.tags['id']
    print(animals[index])
```

Rebuild Protobuf#

To rebuild docarray.proto :

cd docarray
docker run -v $(pwd)/proto:/jina/proto jinaai/protogen