docarray.array.mixins.text module#
- class docarray.array.mixins.text.TextToolsMixin[source]#
Bases:
object
Help functions used in NLP for DA and DAM
- get_vocabulary(min_freq=1, text_attrs=('text',))[source]#
Get the text vocabulary in a dict that maps from the word to the index from all Documents.
- Parameters:
text_attrs (
Tuple
[str
,...
]) – the textual attributes where vocabulary will be derived frommin_freq (
int
) – the minimum word frequency to be considered into the vocabulary.
- Return type:
Dict
[str
,int
]- Returns:
a vocabulary in dictionary where key is the word, value is the index. The value is 2-index, where 0 is reserved for padding, 1 is reserved for unknown token.