docarray.document.mixins.sugar module#

class docarray.document.mixins.sugar.SingletonSugarMixin[source]#

Bases: object

Provide sugary syntax for Document by inheriting methods from DocumentArray

match(darray: DocumentArray, metric: Union[str, Callable[[ArrayType, ArrayType], np.ndarray]] = 'cosine', limit: Optional[Union[int, float]] = 20, normalization: Optional[Tuple[float, float]] = None, metric_name: Optional[str] = None, batch_size: Optional[int] = None, exclude_self: bool = False, only_id: bool = False, use_scipy: bool = False, num_worker: Optional[int] = 1) → T[source]#

Matching the current Document against a set of Documents.

Parameters:

darray – the other DocumentArray to match against
metric – the distance metric
limit – the maximum number of matches, when not given defaults to 20.
normalization – a tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].
metric_name – if provided, then match result will be marked with this string.
batch_size – if provided, then darray is loaded in batches, where each of them is at most batch_size elements. When darray is big, this can significantly speedup the computation.
exclude_self – if set, Documents in darray with same id as the left-hand values will not be considered as matches.
only_id – if set, then returning matches will only contain id
use_scipy – if set, use scipy as the computation backend. Note, scipy does not support distance on sparse matrix.
num_worker –
the number of parallel workers. If not given, then the number of CPUs in the system will be used.

Note

This argument is only effective when batch_size is set.

Return type:

Returns:

itself after modification

embed(embed_model: AnyDNN, device: str = 'cpu', batch_size: int = 256) → T[source]#

Fill the embedding of Documents inplace by using embed_model

Parameters:

embed_model – the embedding model written in Keras/Pytorch/Paddle
device – the computational device for embed_model, can be either cpu or cuda.
batch_size – number of Documents in a batch for embedding

Return type:

post(*args, **kwargs)[source]#

Posting itself to a remote Flow/Sandbox and get the modified DocumentArray back

Parameters:

host – a host string. Can be one of the following: - grpc://192.168.0.123:8080/endpoint - ws://192.168.0.123:8080/endpoint - http://192.168.0.123:8080/endpoint - jinahub://Hello/endpoint - jinahub+docker://Hello/endpoint - jinahub+docker://Hello/v0.0.1/endpoint - jinahub+docker://Hello/latest/endpoint - jinahub+sandbox://Hello/endpoint
show_progress – if to show a progressbar
batch_size – number of Document on each request
parameters – parameters to send in the request

Return type:

Returns:

the new DocumentArray returned from remote