Changelog#

DocArray follows semantic versioning. However, before the project reach 1.0.0, any breaking change will only bump the minor version. An automated release note is generated on every release. The release note includes features, bugs, refactorings etc.

This chapter only tracks the most important breaking changes and explain the rationale behind them.

0.11.0: change DocumentArrayInMemory’s container to list#

Previously, DA’s backend in Dict. The change affects at very low-level. It is breaking.

0.10.0: add high-level dataclass API to Document#

0.10 introduces a high-level dataclass API for Document, which involves minor changes on Protobuf, selector syntax, etc. Its usage is intentionally undocumented as its feature is still under testing and refactoring. The full documentation about its usage is expected to be available in 0.12.

The change is backward-compatible.

0.5.0: add storage backend to DocumentArray#

0.5 introduces an important feature that enables external Document Store as the backend of DocumentArray. It also refactors the implementation of DocumentArray. The change should be backward-compatible; and the version bumping is mainly because of the introduction of the new concept “storage”.

0.4.0: change on the DocumentArray serialization format#

This change affects DocumentArray.load_binary, DocumentArray.from_bytes, DocumentArray.to_bytes and users can not load old DocumentArray back if they store it with protocol='pickle' and protocol='protobuf' under old version.

The major change of 0.4.0 is the serialization format of DocumentArray when protocol is set to pickle and protobuf. The new format enables streaming for large on-disk serialization, however the format itself is not back-compatible. One can read more details from Wire format of pickle and protobuf.

Migration guide:

  • If you are using protocol='pickle' and protocol='protobuf' for on-disk serialization, you need to re-generate the serialized file.

0.3.0: change on the default JSON/dict serialization strategy#

This change is a breaking change and is not back-compatible.

Document/DocumentArray now favors schema-ed JSON over “unschema-ed” JSON in both JSON & dict IO interfaces. Specifically, 0.3.0 introduces protocol='jsonschema' (as default) and protocol='protobuf' to allow user to control the serialization behavior.

Migration guide:

  • Read the docs: From/to JSON.

  • If you are using .to_dict(), .to_json(), .from_dict(), .from_json() at Document/DocumentArray level, please be aware the change of JSON output.

  • If you want to stick to old Protobuf-based JSON (not recommended, as it is “unschema-ed”), use .to_json(protocol='protobuf') and .from_json(protocol='protobuf').

  • Fine-grained controls can be archived by passing extra key-value args as described in From/to JSON.

0.2.0: change on the content field name#

This change is a breaking change and is not back-compatible.

The Document schema is changed as follows:

0.1.x

0.2

Semantic

.blob

.tensor

To represent the ndarray of a Document

.buffer

.blob

To represent the binary representation of a Document

This changed is made based on the word “BLOB” is a well-acknowledged as “binary large object” in the database field. It is a more natural wording for representing binary and less natural for representing ndarray. Previously, only Caffee used blob to represent ndarray.

Unifying the terminology also avoids confusion when integrate DocArray into some databases.

All fluent interfaces of Document are also changed accordingly.

Here is a short migration guide for 0.1.x users:

Old

New

Remark

Document.blob

Document.tensor

Document.buffer

Document.blob

DocumentArray.blobs, da[:, 'blob']

Document.tensors, da[:, 'tensor']

DocumentArray.buffers, da[:, 'buffer']

Document.blobs, da[:, 'blob']

Document.blob

Document.tensor

Apply to all functions in here

Document.buffer

Document.blob

Apply to all functions in here

JSON Schema needs to be re-generated by following this.