docarray.document.mixins.featurehash module#
- class docarray.document.mixins.featurehash.FeatureHashMixin[source]#
Bases:
object
Provide helper functions for feature hashing.
- embed_feature_hashing(n_dim=256, sparse=False, fields=('text', 'tags'), max_value=1000000)[source]#
Convert an arbitrary set of attributes into a fixed-dimensional matrix using the hashing trick.
- Parameters:
n_dim (
int
) – the dimensionality of each document in the output embedding. Small numbers of features are likely to cause hash collisions, but large numbers will cause larger overall parameter dimensions.sparse (
bool
) – whether the resulting feature matrix should be a sparse csr_matrix or dense ndarray. Note that this feature requiresscipy
fields (
Tuple
[str
,...
]) – which attributes to be considered as for feature hashing.
- Return type:
T