docarray.array.mixins.evaluation module#

class docarray.array.mixins.evaluation.EvaluationMixin[source]#

Bases: object

A mixin that provides ranking evaluation functionality to DocumentArrayLike objects

evaluate(metrics, ground_truth=None, hash_fn=None, metric_names=None, strict=True, label_tag='label', num_relevant_documents_per_label=None, **kwargs)[source]#

Compute ranking evaluation metrics for a given DocumentArray when compared with a ground truth.

If one provides a ground_truth DocumentArray that is structurally identical to self, this function compares the matches of documents inside the DocumentArray to this ground_truth. Alternatively, one can directly annotate the documents by adding labels in the form of tags with the key specified in the label_tag attribute. Those tags need to be added to self as well as to the documents in the matches properties.

This method will fill the evaluations field of Documents inside this DocumentArray and will return the average of the computations

Parameters:
  • metrics (List[Union[str, Callable[..., float]]]) – List of metric names or metric functions to be computed

  • ground_truth (Optional[DocumentArray]) – The ground_truth DocumentArray that the DocumentArray compares to.

  • hash_fn (Optional[Callable[[Document], str]]) – For the evaluation against a ground_truth DocumentArray, this function is used for generating hashes which are used to compare the documents. If not given, Document.id is used.

  • metric_names (Optional[List[str]]) – If provided, the results of the metrics computation will be stored in the evaluations field of each Document with this names. If not provided, the names will be derived from the metric function names.

  • strict (bool) – If set, then left and right sides are required to be fully aligned: on the length, and on the semantic of length. These are preventing you to evaluate on irrelevant matches accidentally.

  • label_tag (str) – Specifies the tag which contains the labels.

  • num_relevant_documents_per_label (Optional[Dict[Any, int]]) – Some metrics, e.g., recall@k, require the number of relevant documents. To apply those to a labeled dataset, one can provide a dictionary which maps labels to the total number of documents with this label.

  • kwargs – Additional keyword arguments to be passed to the metric functions.

Return type:

Dict[str, float]

Returns:

A dictionary which stores for each metric name the average evaluation score.

embed_and_evaluate(metrics, index_data=None, ground_truth=None, metric_names=None, strict=True, label_tag='label', embed_models=None, embed_funcs=None, device='cpu', batch_size=256, collate_fns=None, distance='cosine', limit=20, normalization=None, exclude_self=False, use_scipy=False, num_worker=1, match_batch_size=100000, query_sample_size=1000, **kwargs)[source]#

Computes ranking evaluation metrics for a given DocumentArray. This function does embedding and matching in the same turn. Thus, you don’t need to call embed and match before it. Instead, it embeds the documents in self (and index_data when provided`) and compute the nearest neighbour itself. This might be done in batches for the index_data object to reduce the memory consumption of the evlauation process. The evaluation itself can be done against a ground_truth DocumentArray or on the basis of labels like it is possible with the :func:evaluate function.

Parameters:
  • metrics (List[Union[str, Callable[..., float]]]) – List of metric names or metric functions to be computed

  • index_data (Optional[DocumentArray]) – The other DocumentArray to match against, if not given, self will be matched against itself. This means that every document in will be compared to all other documents in self to determine the nearest neighbors.

  • ground_truth (Optional[DocumentArray]) – The ground_truth DocumentArray that the DocumentArray compares to.

  • metric_names (Optional[str]) – If provided, the results of the metrics computation will be stored in the evaluations field of each Document with this names. If not provided, the names will be derived from the metric function names.

  • strict (bool) – If set, then left and right sides are required to be fully aligned: on the length, and on the semantic of length. These are preventing you to evaluate on irrelevant matches accidentally.

  • label_tag (str) – Specifies the tag which contains the labels.

  • embed_models (Union[AnyDNN, Tuple[AnyDNN, AnyDNN], None]) – One or two embedding model written in Keras / Pytorch / Paddle for embedding self and index_data.

  • embed_funcs (Union[Callable, Tuple[Callable, Callable], None]) – As an alternative to embedding models, custom embedding functions can be provided.

  • device (str) – the computational device for embed_models, and the matching can be either cpu or cuda.

  • batch_size (Union[int, Tuple[int, int]]) – Number of documents in a batch for embedding.

  • collate_fns (Union[CollateFnType, None, Tuple[Optional[CollateFnType], Optional[CollateFnType]]]) – For each embedding function the respective collate function creates a mini-batch of input(s) from the given DocumentArray. If not provided a default built-in collate_fn uses the tensors of the documents to create input batches.

  • distance (Union[str, Callable[[ArrayType, ArrayType], ndarray]]) – The distance metric.

  • limit (Union[int, float, None]) – The maximum number of matches, when not given defaults to 20.

  • normalization (Optional[Tuple[float, float]]) – A tuple [a, b] to be used with min-max normalization, the min distance will be rescaled to a, the max distance will be rescaled to b all values will be rescaled into range [a, b].

  • exclude_self (bool) – If set, Documents in index_data with same id as the left-hand values will not be considered as matches.

  • use_scipy (bool) – if set, use scipy as the computation backend. Note, scipy does not support distance on sparse matrix.

  • num_worker (int) – Specifies the number of workers for the execution of the match function.

  • kwargs – Additional keyword arguments to be passed to the metric functions.

  • query_sample_size (int) – For a large number of documents in self the evaluation becomes infeasible, especially, if index_data is large. Therefore, queries are sampled if the number of documents in self exceeds query_sample_size. Usually, this has only small impact on the mean metric values returned by this function. To prevent sampling, you can set query_sample_size to None.

Parma match_batch_size:

The number of documents which are embedded and matched at once. Set this value to a lower value, if you experience high memory consumption.

Return type:

Union[float, List[float], None]

Returns:

A dictionary which stores for each metric name the average evaluation score.