docarray.array.mixins.group module#
- class docarray.array.mixins.group.GroupMixin[source]#
Bases:
objectThese helpers yield groups of
DocumentArrayfrom a sourceDocumentArray.- split_by_tag(tag)[source]#
Split the DocumentArray into multiple DocumentArray according to the tag value of each Document.
- Parameters:
tag (
str) – the tag name to split stored in tags.- Return type:
Dict[Any,DocumentArray]- Returns:
a dict where Documents with the same value on tag are grouped together, their orders are preserved from the original
DocumentArray.
Note
If the
tagsofDocumentdo not contains the specifiedtag, return an empty dict.
- batch(batch_size, shuffle=False, show_progress=False)[source]#
Creates a Generator that yields DocumentArray of size batch_size until docs is fully traversed along the traversal_path. The None docs are filtered out and optionally the docs can be filtered by checking for the existence of a Document attribute. Note, that the last batch might be smaller than batch_size.
- Parameters:
batch_size (
int) – Size of each generated batch (except the last one, which might be smaller, default: 32)shuffle (
bool) – If set, shuffle the Documents before dividing into minibatches.show_progress (
bool) – if set, show a progress bar when batching documents.
- Yield:
a Generator of DocumentArray, each in the length of batch_size
- Return type:
Generator[DocumentArray,None,None]
- batch_ids(batch_size, shuffle=False)[source]#
Creates a Generator that yields lists of ids of size batch_size until self is fully traversed. Note, that the last batch might be smaller than batch_size.
- Parameters:
batch_size (
int) – Size of each generated batch (except the last one, which might be smaller)shuffle (
bool) – If set, shuffle the Documents before dividing into minibatches.
- Yield:
a Generator of list of IDs, each in the length of batch_size
- Return type:
Generator[List[str],None,None]