docarray.array.mixins.group module#
- class docarray.array.mixins.group.GroupMixin[source]#
Bases:
object
These helpers yield groups of
DocumentArray
from a sourceDocumentArray
.- split_by_tag(tag)[source]#
Split the DocumentArray into multiple DocumentArray according to the tag value of each Document.
- Parameters:
tag (
str
) – the tag name to split stored in tags.- Return type:
Dict
[Any
,DocumentArray
]- Returns:
a dict where Documents with the same value on tag are grouped together, their orders are preserved from the original
DocumentArray
.
Note
If the
tags
ofDocument
do not contains the specifiedtag
, return an empty dict.
- batch(batch_size, shuffle=False, show_progress=False)[source]#
Creates a Generator that yields DocumentArray of size batch_size until docs is fully traversed along the traversal_path. The None docs are filtered out and optionally the docs can be filtered by checking for the existence of a Document attribute. Note, that the last batch might be smaller than batch_size.
- Parameters:
batch_size (
int
) – Size of each generated batch (except the last one, which might be smaller, default: 32)shuffle (
bool
) – If set, shuffle the Documents before dividing into minibatches.show_progress (
bool
) – if set, show a progress bar when batching documents.
- Yield:
a Generator of DocumentArray, each in the length of batch_size
- Return type:
Generator
[DocumentArray
,None
,None
]
- batch_ids(batch_size, shuffle=False)[source]#
Creates a Generator that yields lists of ids of size batch_size until self is fully traversed. Note, that the last batch might be smaller than batch_size.
- Parameters:
batch_size (
int
) – Size of each generated batch (except the last one, which might be smaller)shuffle (
bool
) – If set, shuffle the Documents before dividing into minibatches.
- Yield:
a Generator of list of IDs, each in the length of batch_size
- Return type:
Generator
[List
[str
],None
,None
]