docarray.array.mixins.dataloader package#
Submodules#
Module contents#
- class docarray.array.mixins.dataloader.DataLoaderMixin[source]#
- Bases: - object- classmethod dataloader(path, func, batch_size, protocol='protobuf', compress=None, backend='thread', num_worker=None, pool=None, show_progress=False)[source]#
- Load array elements, batches and maps them with a function in parallel, finally yield the batch in DocumentArray - Parameters:
- path ( - Union[- str,- Path]) – Path or filename where the data is stored.
- func ( - Callable[[- DocumentArray], T]) – a function that takes- DocumentArrayas input and outputs anything. You can either modify elements in-place (only with thread backend) or work later on return elements.
- batch_size ( - int) – Size of each generated batch (except the last one, which might be smaller)
- protocol ( - str) – protocol to use
- compress ( - Optional[- str]) – compress algorithm to use
- backend ( - str) –- if to use multi-process or multi-thread as the parallelization backend. In general, if your - funcis IO-bound then perhaps thread is good enough. If your- funcis CPU-bound then you may use process. In practice, you should try yourselves to figure out the best value. However, if you wish to modify the elements in-place, regardless of IO/CPU-bound, you should always use thread backend.- Warning - When using process backend, you should not expect - funcmodify elements in-place. This is because the multiprocessing backing pass the variable via pickle and work in another process. The passed object and the original object do not share the same memory.
- num_worker ( - Optional[- int]) – the number of parallel workers. If not given, then the number of CPUs in the system will be used.
- pool ( - Union[Pool, ThreadPool,- None]) – use an existing/external pool. If given, backend is ignored and you will be responsible for closing the pool.
- show_progress ( - bool) – if set, show a progressbar
 
- Return type:
- Generator[- DocumentArray,- None,- None]
- Returns:
 
 
