#
TableYou can freely convert between DocumentArray and pandas.Dataframe
, read more details in Serialization. Besides, you can load and write CSV file with DocumentArray.
Load CSV table#
You can easily load tabular data from csv
file into a DocumentArray. For example,
Username;Identifier;First name;Last name
booker12;9012;Rachel;Booker
grey07;2070;Laura;Grey
johnson81;4081;Craig;Johnson
jenkins46;9346;Mary;Jenkins
smith79;5079;Jamie;Smith
from docarray import DocumentArray
da = DocumentArray.from_csv('toy.csv')
Documents Summary
Length 5
Homogenous Documents True
Common Attributes ('id', 'tags')
Attributes Summary
Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 5 False
tags ('dict',) 5 False
You can observe that each row is loaded as a Document and the columns are loaded into Document.tags
.
In general, from_csv
will try its best to resolve the column names of the table and map them into the corresponding Document attributes. If such attempt fails, you can always resolve the field manually via:
from docarray import DocumentArray
da = DocumentArray.from_csv('toy.csv', field_resolver={'Identifier': 'id'})
Save as CSV file#
Saving a DocumentArray as a csv
file is easy.
da.save_csv('tmp.csv')
One thing needs to be careful is that tabular data is often not good for representing nested Document. Hence, nested Document will be stored in flatten.
If your Documents contain tags, and you want to store each tag in a separate column, then you can do:
from docarray import DocumentArray, Document
da = DocumentArray([Document(tags={'english': 'hello', 'german': 'hallo'}),
Document(tags={'english': 'world', 'german': 'welt'})])
da.save_csv('toy.csv', flatten_tags=True)
id,tag__english,tag__german
029388a4-3830-11ec-b6b2-1e008a366d48,hello,hallo
0293968c-3830-11ec-b6b2-1e008a366d48,world,welt
id,tags
418de220-3830-11ec-aad8-1e008a366d48,"{'german': 'hallo', 'english': 'hello'}"
418dec52-3830-11ec-aad8-1e008a366d48,"{'english': 'world', 'german': 'welt'}"