Protocol Documentation#
Table of Contents#
docarray.proto#
DenseNdArrayProto#
Represents a (quantized) dense n-dim array
Field |
Type |
Label |
Description |
---|---|---|---|
buffer |
the actual array data, in bytes |
||
shape |
repeated |
the shape (dimensions) of the array |
|
dtype |
the data type of the array |
DocumentArrayProto#
Field |
Type |
Label |
Description |
---|---|---|---|
docs |
repeated |
a list of Documents |
DocumentProto#
Represents a Document
Field |
Type |
Label |
Description |
---|---|---|---|
id |
A hexdigest that represents a unique document ID |
||
blob |
the raw binary content of this document, which often represents the original document when comes into jina |
||
tensor |
the ndarray of the image/audio/video document |
||
text |
a text document |
||
granularity |
the depth of the recursive chunk structure |
||
adjacency |
the width of the recursive match structure |
||
parent_id |
the parent id from the previous granularity |
||
weight |
The weight of this document |
||
uri |
a uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme |
||
modality |
modality, an identifier to the modality this document belongs to. In the scope of multi/cross modal search |
||
mime_type |
mime type of this document, for buffer content, this is required; for other contents, this can be guessed |
||
offset |
the offset of the doc |
||
location |
repeated |
the position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be timestamp of an audio clip |
|
chunks |
repeated |
list of the sub-documents of this document (recursive structure) |
|
matches |
repeated |
the matched documents on the same level (recursive structure) |
|
embedding |
the embedding of this document |
||
tags |
a structured data value, consisting of field which map to dynamically typed values. |
||
scores |
repeated |
Scores performed on the document, each element corresponds to a metric |
|
evaluations |
repeated |
Evaluations performed on the document, each element corresponds to a metric |
|
_metadata |
system-defined meta attributes represented in a structured data value. |
DocumentProto.EvaluationsEntry#
Field |
Type |
Label |
Description |
---|---|---|---|
key |
|||
value |
DocumentProto.ScoresEntry#
Field |
Type |
Label |
Description |
---|---|---|---|
key |
|||
value |
NamedScoreProto#
Represents the relevance model to ref_id
Field |
Type |
Label |
Description |
---|---|---|---|
value |
value |
||
op_name |
the name of the operator/score function |
||
description |
text description of the score |
||
ref_id |
the score is computed between doc |
NdArrayProto#
Represents a general n-dim array, can be either dense or sparse
Field |
Type |
Label |
Description |
---|---|---|---|
dense |
dense representation of the ndarray |
||
sparse |
sparse representation of the ndarray |
||
cls_name |
the name of the ndarray class |
||
parameters |
SparseNdArrayProto#
Represents a sparse ndarray
Field |
Type |
Label |
Description |
---|---|---|---|
indices |
A 2-D int64 tensor of shape [N, ndims], which specifies the indices of the elements in the sparse tensor that contain nonzero values (elements are zero-indexed) |
||
values |
A 1-D tensor of any type and shape [N], which supplies the values for each element in indices. |
||
shape |
repeated |
A 1-D int64 tensor of shape [ndims], which specifies the shape of the sparse tensor. |
Scalar Value Types#
.proto Type |
Notes |
C++ |
Java |
Python |
Go |
C# |
PHP |
Ruby |
---|---|---|---|---|---|---|---|---|
double |
double |
float |
float64 |
double |
float |
Float |
||
float |
float |
float |
float32 |
float |
float |
Float |
||
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int32 |
int |
integer |
Bignum or Fixnum (as required) |
|
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long |
int64 |
long |
integer/string |
Bignum |
|
Uses variable-length encoding. |
uint32 |
int |
int/long |
uint32 |
uint |
integer |
Bignum or Fixnum (as required) |
|
Uses variable-length encoding. |
uint64 |
long |
int/long |
uint64 |
ulong |
integer/string |
Bignum or Fixnum (as required) |
|
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
int32 |
int |
integer |
Bignum or Fixnum (as required) |
|
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long |
int64 |
long |
integer/string |
Bignum |
|
Always four bytes. More efficient than uint32 if values are often greater than 2^28. |
uint32 |
int |
int |
uint32 |
uint |
integer |
Bignum or Fixnum (as required) |
|
Always eight bytes. More efficient than uint64 if values are often greater than 2^56. |
uint64 |
long |
int/long |
uint64 |
ulong |
integer/string |
Bignum |
|
Always four bytes. |
int32 |
int |
int |
int32 |
int |
integer |
Bignum or Fixnum (as required) |
|
Always eight bytes. |
int64 |
long |
int/long |
int64 |
long |
integer/string |
Bignum |
|
bool |
boolean |
boolean |
bool |
bool |
boolean |
TrueClass/FalseClass |
||
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
str/unicode |
string |
string |
string |
String (UTF-8) |
|
May contain any arbitrary sequence of bytes. |
string |
ByteString |
str |
[]byte |
ByteString |
string |
String (ASCII-8BIT) |
````{tab} 😔 Don't
```{code-block} python
---
emphasize-lines: 9, 10
---
animals = ['cat', 'dog', 'fish']
da = DocumentArray([Document(id=i) for i in range(3)])
da.save_binary('aux.bin', protocol='protobuf')
da_loaded = DocumentArray.load_binary('aux.bin', protocol='protobuf')
for doc in da_loaded:
index = da.tags['id']
print(animals[index])
```
Rebuild Protobuf#
To rebuild docarray.proto
:
cd docarray
docker run -v $(pwd)/proto:/jina/proto jinaai/protogen