docarray.document.mixins.sugar module#
- class docarray.document.mixins.sugar.SingletonSugarMixin[source]#
Bases:
object
Provide sugary syntax for
Document
by inheriting methods fromDocumentArray
- match(darray: DocumentArray, metric: Union[str, Callable[[ArrayType, ArrayType], np.ndarray]] = 'cosine', limit: Optional[Union[int, float]] = 20, normalization: Optional[Tuple[float, float]] = None, metric_name: Optional[str] = None, batch_size: Optional[int] = None, exclude_self: bool = False, only_id: bool = False, use_scipy: bool = False, num_worker: Optional[int] = 1) T [source]#
Matching the current Document against a set of Documents.
- Parameters:
darray – the other DocumentArray to match against
metric – the distance metric
limit – the maximum number of matches, when not given defaults to 20.
normalization – a tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].
metric_name – if provided, then match result will be marked with this string.
batch_size – if provided, then
darray
is loaded in batches, where each of them is at mostbatch_size
elements. When darray is big, this can significantly speedup the computation.exclude_self – if set, Documents in
darray
with sameid
as the left-hand values will not be considered as matches.only_id – if set, then returning matches will only contain
id
use_scipy – if set, use
scipy
as the computation backend. Note,scipy
does not support distance on sparse matrix.num_worker –
the number of parallel workers. If not given, then the number of CPUs in the system will be used.
Note
This argument is only effective when
batch_size
is set.
- Return type:
T
- Returns:
itself after modification
- embed(embed_model: AnyDNN, device: str = 'cpu', batch_size: int = 256) T [source]#
Fill the embedding of Documents inplace by using embed_model
- Parameters:
embed_model – the embedding model written in Keras/Pytorch/Paddle
device – the computational device for embed_model, can be either cpu or cuda.
batch_size – number of Documents in a batch for embedding
- Return type:
T
- post(*args, **kwargs)[source]#
Posting itself to a remote Flow/Sandbox and get the modified DocumentArray back
- Parameters:
host – a host string. Can be one of the following: - grpc://192.168.0.123:8080/endpoint - ws://192.168.0.123:8080/endpoint - http://192.168.0.123:8080/endpoint - jinahub://Hello/endpoint - jinahub+docker://Hello/endpoint - jinahub+docker://Hello/v0.0.1/endpoint - jinahub+docker://Hello/latest/endpoint - jinahub+sandbox://Hello/endpoint
show_progress – if to show a progressbar
batch_size – number of Document on each request
parameters – parameters to send in the request
- Return type:
T
- Returns:
the new DocumentArray returned from remote