docarray.array.mixins.embed module#

class docarray.array.mixins.embed.EmbedMixin[source]#

Bases: object

Helper functions for embedding with a model

embed(embed_model, device='cpu', batch_size=256, to_numpy=False, collate_fn=None)[source]#

Fill embedding of Documents inplace by using embed_model For the evaluation of a model, one can directly use the embed_and_evaluate() function.

Parameters:
  • embed_model (AnyDNN) – The embedding model written in Keras/Pytorch/Paddle

  • device (str) – The computational device for embed_model, can be either cpu or cuda.

  • batch_size (int) – Number of Documents in a batch for embedding

  • to_numpy (bool) – If to store embeddings back to Document in numpy.ndarray or original framework format.

  • collate_fn (Optional[CollateFnType]) – create a mini-batch of Input(s) from the given DocumentArray. Default built-in collate_fn is to use the tensors of the documents.

Return type:

T

Returns:

itself after modified.

docarray.array.mixins.embed.get_framework(dnn_model)[source]#

Return the framework that powers a DNN model.

Note

This is not a solid implementation. It is based on __module__ name, the key idea is to tell dnn_model without actually importing the framework.

Parameters:

dnn_model – a DNN model

Return type:

str

Returns:

keras, torch, paddle or ValueError