JVMFileDocumentEmbeddingStorage

A file-system-based storage implementation for managing and embedding documents represented by file paths.

This class extends EmbeddingBasedDocumentStorage and is specialized for JVM-based systems where documents are represented as file paths (Path). It combines a DocumentEmbedder for embedding the file content into vectors and a JVMFileVectorStorage for managing the storage and retrieval of these embeddings along with their associated documents.

The primary responsibility of this class is to facilitate:

  • Storing and embedding documents housed in a file system using a specified DocumentEmbedder.

  • Ranking documents based on similarity to query embeddings.

  • Managing file-based vector storage via JVMFileVectorStorage.

Parameters

embedder

The embedder responsible for generating vector representations of file-based documents.

root

The root directory path used as the base for file-based vector storage.

Constructors

Link copied to clipboard
constructor(embedder: DocumentEmbedder<Path>, root: Path)

Creates an instance of JVMFileDocumentEmbeddingStorage.

Functions

Link copied to clipboard
open override fun allDocuments(): Flow<Path>

Retrieves a flow of all documents stored in the system.

Link copied to clipboard
Link copied to clipboard
open suspend override fun delete(documentId: String): Boolean

Deletes the document with the specified ID from the storage.

Link copied to clipboard
open suspend override fun getPayload(documentId: String)
Link copied to clipboard
open override fun rankDocuments(query: String): Flow<RankedDocument<Path>>

Ranks documents based on their similarity to a given query string.

Link copied to clipboard
open suspend override fun read(documentId: String): Path?

Reads a document by its unique identifier.

Link copied to clipboard
open suspend override fun readWithPayload(documentId: String): DocumentWithPayload<Path, Unit>?
Link copied to clipboard
open suspend fun store(document: Path): String

open suspend override fun store(document: Path, data: Unit): String

Stores the given document after embedding it into a vector representation.