Welcome to DocArray!#

⬆️ DocArray v2: We are currently working on v2 of DocArray. Keep reading here if you are interested in the current (stable) version, or check out the v2 alpha branch and v2 roadmap!

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API.

🚪 Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc.

🧑‍🔬 Data science powerhouse: greatly accelerate data scientists’ work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU.

🚡 Data in transit: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data.

🔎 One-stop k-NN: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate.

👒 For modern apps: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services.

🐍 Pythonic experience: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write.

🛸 IDE integration: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code.

Read more on why should you use DocArray and comparison to alternatives.

Install#

PyPI is the latest version.

Make sure you have Python 3.7+ and numpy installed on Linux/Mac/Windows:

pip install docarray

No extra dependencies are installed.

conda install -c conda-forge docarray

No extra dependencies are installed.

pip install "docarray[common]"

The following dependencies are installed to enable the most common features:

Package

Used in

protobuf

advanced serialization

lz4

compression in serialization

requests

push/pull to Jina Cloud

matplotlib

visualizing image sprites

Pillow

image data-related IO

fastapi

used in embedding projector of DocumentArray

uvicorn

used in embedding projector of DocumentArray

pip install "docarray[full]"

In addition to common, the following dependencies are installed to enable full features:

Package

Used in

scipy

sparse embedding, tensors

av

video processing and IO

trimesh

3D mesh processing and IO

strawberry-graphql

GraphQL support

Alternatively, you can first do basic installation and then install missing dependencies on-demand.

pip install "docarray[full,test]"

This installs all requirements for reproducing tests on your local dev environment.

>>> import docarray
>>> docarray.__version__
'0.1.0'
>>> from docarray import Document, DocumentArray

Index | Module Index