TextFileDocumentEmbeddingStorage
A file-based implementation of document storage utilizing embeddings for ranking and retrieval.
This class specializes in storing and ranking text documents in a file system using embeddings derived from their textual content. It integrates several components:
A
TextDocumentReader
to extract textual content from the provided documents.An
Embedder
to generate vector embeddings from this textual content.A file-based vector storage for storing documents alongside their embeddings.
The storage system allows document ranking based on similarity to a given query, ensuring efficient, persistent document search and retrieval.
Parameters
The type of document to be stored and processed.
Converts text into vector embeddings and calculates similarity between embeddings.
Extracts text from documents of type Document for embedding purposes.
Platform-specific file system provider for path manipulations
Root directory where all vector storage will be located