RoutingLLMPromptExecutor

Executes prompts with load balancing across multiple LLM clients.

Delegates client selection to LLMClientRouter, which determines which client should handle each request based on the requested model. This enables load distribution strategies like round-robin, weighted routing, or health-based selection.

Parameters

clientRouter

Router responsible for selecting appropriate clients for each request

fallback

Optional fallback configuration when no client is available for the requested model

Constructors

Link copied to clipboard

Creates executor with a map of providers to their client lists. Uses RoundRobinRouter for load distribution.

Creates executor with a list of clients. Clients are grouped by provider and routed using RoundRobinRouter.

constructor(vararg llmClients: LLMClient, fallback: RoutingLLMPromptExecutor.FallbackPromptExecutorSettings? = null)

Creates executor with a list of clients. Clients are grouped by provider and routed using RoundRobinRouter.

Types

Link copied to clipboard
data class FallbackPromptExecutorSettings(val fallbackModel: LLModel)

Represents configuration for a fallback large language model (LLM) execution strategy.

Functions

Link copied to clipboard
open override fun close()
Link copied to clipboard
open suspend override fun execute(prompt: Prompt, model: LLModel, tools: List<ToolDescriptor>): List<Message.Response>

Executes a given prompt using the specified tools and model, and returns a list of response messages.

Link copied to clipboard
open suspend override fun executeMultipleChoices(prompt: Prompt, model: LLModel, tools: List<ToolDescriptor>): List<LLMChoice>

Executes a given prompt using the specified tools and model and returns a list of model choices.

Link copied to clipboard
open override fun executeStreaming(prompt: Prompt, model: LLModel, tools: List<ToolDescriptor>): Flow<StreamFrame>

Executes the given prompt with the specified model and streams the response in chunks as a flow.

Link copied to clipboard
suspend fun <T> PromptExecutor.executeStructured(prompt: Prompt, model: LLModel, config: StructuredRequestConfig<T>, fixingParser: StructureFixingParser? = null): Result<StructuredResponse<T>>
inline suspend fun <T> PromptExecutor.executeStructured(prompt: Prompt, model: LLModel, examples: List<T> = emptyList(), fixingParser: StructureFixingParser? = null): Result<StructuredResponse<T>>
suspend fun <T> PromptExecutor.executeStructured(prompt: Prompt, model: LLModel, serializer: KSerializer<T>, examples: List<T> = emptyList(), fixingParser: StructureFixingParser? = null): Result<StructuredResponse<T>>

Executes a prompt with structured output, enhancing it with schema instructions or native structured output parameter, and parses the response into the defined structure.

Link copied to clipboard

Basic JSON schema generator required for the given model. Return BasicJsonSchemaGenerator by default.

Link copied to clipboard

Standard JSON schema generator required for the given model. Return StandardJsonSchemaGenerator by default.

Link copied to clipboard
open suspend override fun models(): List<LLModel>

Retrieves a list of available models from all LLM clients managed by this executor.

Link copied to clipboard
open suspend override fun moderate(prompt: Prompt, model: LLModel): ModerationResult

Moderates the provided multi-modal content using the specified model.

Link copied to clipboard

Parses a structured response from the assistant message using the provided structured output configuration and language model. If a fixing parser is specified in the configuration, it will be used; otherwise, the structure will be parsed directly.