Welcome to DocArray!#
⬆️ DocArray v2: We are currently working on v2 of DocArray. Keep reading here if you are interested in the current (stable) version, or check out the v2 alpha branch and v2 roadmap!
DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.
🚪 Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc.
🧑🔬 Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.
🚡 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.
🔎 One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate.
👒 For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services.
🐍 Pythonic experience: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write.
🛸 IDE integration: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code.
Read more on why should you use DocArray and comparison to alternatives.
Install#
is the latest version.
Make sure you have Python 3.7+ and numpy
installed on Linux/Mac/Windows:
pip install docarray
No extra dependencies are installed.
conda install -c conda-forge docarray
No extra dependencies are installed.
pip install "docarray[common]"
The following dependencies are installed to enable the most common features:
Package |
Used in |
---|---|
|
advanced serialization |
|
compression in serialization |
|
push/pull to Jina Cloud |
|
visualizing image sprites |
|
image data-related IO |
|
used in embedding projector of DocumentArray |
|
used in embedding projector of DocumentArray |
pip install "docarray[full]"
In addition to common
, the following dependencies are installed to enable full features:
Package |
Used in |
---|---|
|
sparse embedding, tensors |
|
video processing and IO |
|
3D mesh processing and IO |
|
GraphQL support |
Alternatively, you can first do basic installation and then install missing dependencies on-demand.
pip install "docarray[full,test]"
This installs all requirements for reproducing tests on your local dev environment.
>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray