docarray.document.mixins.sugar module#

class docarray.document.mixins.sugar.SingletonSugarMixin[source]#

Bases: object

Provide sugary syntax for Document by inheriting methods from DocumentArray

match(darray: DocumentArray, metric: Union[str, Callable[[ArrayType, ArrayType], np.ndarray]] = 'cosine', limit: Optional[Union[int, float]] = 20, normalization: Optional[Tuple[float, float]] = None, metric_name: Optional[str] = None, batch_size: Optional[int] = None, exclude_self: bool = False, only_id: bool = False, use_scipy: bool = False, num_worker: Optional[int] = 1) T[source]#

Matching the current Document against a set of Documents.

Parameters:
  • darray – the other DocumentArray to match against

  • metric – the distance metric

  • limit – the maximum number of matches, when not given defaults to 20.

  • normalization – a tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].

  • metric_name – if provided, then match result will be marked with this string.

  • batch_size – if provided, then darray is loaded in batches, where each of them is at most batch_size elements. When darray is big, this can significantly speedup the computation.

  • exclude_self – if set, Documents in darray with same id as the left-hand values will not be considered as matches.

  • only_id – if set, then returning matches will only contain id

  • use_scipy – if set, use scipy as the computation backend. Note, scipy does not support distance on sparse matrix.

  • num_worker

    the number of parallel workers. If not given, then the number of CPUs in the system will be used.

    Note

    This argument is only effective when batch_size is set.

Return type:

T

Returns:

itself after modification

embed(embed_model: AnyDNN, device: str = 'cpu', batch_size: int = 256) T[source]#

Fill the embedding of Documents inplace by using embed_model

Parameters:
  • embed_model – the embedding model written in Keras/Pytorch/Paddle

  • device – the computational device for embed_model, can be either cpu or cuda.

  • batch_size – number of Documents in a batch for embedding

Return type:

T

post(*args, **kwargs)[source]#

Posting itself to a remote Flow/Sandbox and get the modified DocumentArray back

Parameters:
  • host – a host string. Can be one of the following: - grpc://192.168.0.123:8080/endpoint - ws://192.168.0.123:8080/endpoint - http://192.168.0.123:8080/endpoint - jinahub://Hello/endpoint - jinahub+docker://Hello/endpoint - jinahub+docker://Hello/v0.0.1/endpoint - jinahub+docker://Hello/latest/endpoint - jinahub+sandbox://Hello/endpoint

  • show_progress – if to show a progressbar

  • batch_size – number of Document on each request

  • parameters – parameters to send in the request

Return type:

T

Returns:

the new DocumentArray returned from remote