TextFileDocumentEmbeddingStorage

A file-based implementation of document storage utilizing embeddings for ranking and retrieval.

This class specializes in storing and ranking text documents in a file system using embeddings derived from their textual content. It integrates several components:

  • A TextDocumentReader to extract textual content from the provided documents.

  • An Embedder to generate vector embeddings from this textual content.

  • A file-based vector storage for storing documents alongside their embeddings.

The storage system allows document ranking based on similarity to a given query, ensuring efficient, persistent document search and retrieval.

Parameters

Document

The type of document to be stored and processed.

embedder

Converts text into vector embeddings and calculates similarity between embeddings.

reader

Extracts text from documents of type Document for embedding purposes.

fs

Platform-specific file system provider for path manipulations

root

Root directory where all vector storage will be located

Inheritors

Constructors

Link copied to clipboard
constructor(embedder: Embedder, documentProvider: DocumentProvider<Path, Document>, fs: FileSystemProvider.ReadWrite<Path>, root: Path)

Functions

Link copied to clipboard
open override fun allDocuments(): Flow<Document>

Retrieves a flow of all documents stored in the system.

Link copied to clipboard
Link copied to clipboard
open suspend override fun delete(documentId: String): Boolean

Deletes the document with the specified ID from the storage.

Link copied to clipboard
open suspend override fun getPayload(documentId: String)
Link copied to clipboard
open override fun rankDocuments(query: String): Flow<RankedDocument<Document>>

Ranks documents based on their similarity to a given query string.

Link copied to clipboard
open suspend override fun read(documentId: String): Document?

Reads a document by its unique identifier.

Link copied to clipboard
open suspend override fun readWithPayload(documentId: String): DocumentWithPayload<Document, Unit>?
Link copied to clipboard
open suspend fun store(document: Document): String

open suspend override fun store(document: Document, data: Unit): String

Stores the given document after embedding it into a vector representation.