docarray.array.mixins.group module#

class docarray.array.mixins.group.GroupMixin[source]#

Bases: object

These helpers yield groups of DocumentArray from a source DocumentArray.

split_by_tag(tag)[source]#

Split the DocumentArray into multiple DocumentArray according to the tag value of each Document.

Parameters:

tag (str) – the tag name to split stored in tags.

Return type:

Dict[Any, DocumentArray]

Returns:

a dict where Documents with the same value on tag are grouped together, their orders are preserved from the original DocumentArray.

Note

If the tags of Document do not contains the specified tag, return an empty dict.

batch(batch_size, shuffle=False, show_progress=False)[source]#

Creates a Generator that yields DocumentArray of size batch_size until docs is fully traversed along the traversal_path. The None docs are filtered out and optionally the docs can be filtered by checking for the existence of a Document attribute. Note, that the last batch might be smaller than batch_size.

Parameters:
  • batch_size (int) – Size of each generated batch (except the last one, which might be smaller, default: 32)

  • shuffle (bool) – If set, shuffle the Documents before dividing into minibatches.

  • show_progress (bool) – if set, show a progress bar when batching documents.

Yield:

a Generator of DocumentArray, each in the length of batch_size

Return type:

Generator[DocumentArray, None, None]

batch_ids(batch_size, shuffle=False)[source]#

Creates a Generator that yields lists of ids of size batch_size until self is fully traversed. Note, that the last batch might be smaller than batch_size.

Parameters:
  • batch_size (int) – Size of each generated batch (except the last one, which might be smaller)

  • shuffle (bool) – If set, shuffle the Documents before dividing into minibatches.

Yield:

a Generator of list of IDs, each in the length of batch_size

Return type:

Generator[List[str], None, None]