prompt-executor-openai-client

A client implementation for executing prompts using OpenAI's GPT models with support for images and audio.

Overview

This module provides a client implementation for the OpenAI API, allowing you to execute prompts using GPT models. It handles authentication, request formatting, response parsing, and multimodal content encoding specific to OpenAI's API requirements.

Supported Models

Reasoning Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4o MiniMedium128KText, Images, ToolsText, Tools

$1.1-$

4.4
o3-miniMedium200KText, ToolsText, Tools

$1.1-$

4.4
o1-miniSlow128KTextText

$1.1-$

4.4
o3Slowest200KText, Images, ToolsText, Tools

$10-$

40
o1Slowest200KText, Images, ToolsText, Tools

$15-$

60

Chat Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4oMedium128KText, Images, ToolsText, Tools

$2.5-$

10
GPT-4.1Medium1MText, Images, ToolsText, Tools

$2-$

8

Audio Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4o Mini AudioFast128KText, Audio, ToolsText, Audio, Tools

$0.15-$

0.6/

$10-$

20
GPT-4o AudioMedium128KText, Audio, ToolsText, Audio, Tools

$2.5-$

10/

$40-$

80

Cost-Optimized Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
o4-miniMedium200KText, Images, ToolsText, Tools

$1.1-$

4.4
GPT-4.1-nanoVery fast1MText, Images, ToolsText, Tools

$0.1-$

0.4
GPT-4.1-miniFast1MText, Images, ToolsText, Tools

$0.4-$

1.6

Embedding Models

ModelSpeedDimensionsInput SupportPricing (per 1M tokens)
text-embedding-3-smallMedium1536Text$0.02
text-embedding-3-largeSlow3072Text$0.13
text-embedding-ada-002Slow1536Text$0.1

Media Content Support

Content TypeSupported FormatsMax SizeNotes
ImagesPNG, JPEG, WebP, GIF20MBBase64 encoded or URL
AudioWAV, MP325MBBase64 encoded only (audio models)
DocumentsPDF20MBBase64 encoded only (vision models)
Video❌ Not supported--

Important Details:

  • Images: Both URL and base64 supported

  • Audio: Only WAV and MP3 formats, base64 only

  • PDF Documents: Only PDF format, requires vision capability

  • Model Requirements: Audio needs Audio capability, PDF needs Vision.Image capability

Using in your project

Add the dependency to your project:

dependencies {
implementation("ai.koog.prompt:prompt-executor-openai-client:$version")
}

Configure the client with your API key:

val openaiClient = OpenAILLMClient(
apiKey = "your-openai-api-key",
)

Example of usage

suspend fun main() {
val client = OpenAILLMClient(
apiKey = System.getenv("OPENAI_API_KEY"),
)

// Text-only example
val response = client.execute(
prompt = prompt {
system("You are helpful assistant")
user("What time is it now?")
},
model = OpenAIModels.Chat.GPT4o
)

println(response)
}

Multimodal Examples

// Image analysis
val imageResponse = client.execute(
prompt = prompt {
user {
text("What do you see in this image?")
image("/path/to/image.jpg")
}
},
model = OpenAIModels.Chat.GPT4o
)

// Audio transcription (requires audio models)
val audioData = File("/path/to/audio.wav").readBytes()
val transcriptionResponse = client.execute(
prompt = prompt {
user {
text("Transcribe this audio")
audio(audioData, "wav")
}
},
model = OpenAIModels.Audio.GPT4oAudio
)

// PDF document processing (requires vision models)
val pdfResponse = client.execute(
prompt = prompt {
user {
text("Summarize this PDF document")
document("/path/to/document.pdf")
}
},
model = OpenAIModels.Chat.GPT4o
)

// Embedding example
val embedding = client.embed(
text = "This is a sample text for embedding",
model = OpenAIModels.Embeddings.TextEmbeddingAda3Small
)

// Mixed content (image + PDF)
val mixedResponse = client.execute(
prompt = prompt {
user {
text("Compare this image with the PDF:")
image("/path/to/chart.png")
document("/path/to/report.pdf")
text("What insights can you provide?")
}
},
model = OpenAIModels.Chat.GPT4o
)

A client implementation for executing prompts using OpenAI's GPT models with support for images and audio.

Overview

This module provides a client implementation for the OpenAI API, allowing you to execute prompts using GPT models. It handles authentication, request formatting, response parsing, and multimodal content encoding specific to OpenAI's API requirements.

Supported Models

Reasoning Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4o MiniMedium128KText, Images, ToolsText, Tools

$1.1-$

4.4
o3-miniMedium200KText, ToolsText, Tools

$1.1-$

4.4
o1-miniSlow128KTextText

$1.1-$

4.4
o3Slowest200KText, Images, ToolsText, Tools

$10-$

40
o1Slowest200KText, Images, ToolsText, Tools

$15-$

60

Chat Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4oMedium128KText, Images, ToolsText, Tools

$2.5-$

10
GPT-4.1Medium1MText, Images, ToolsText, Tools

$2-$

8

Audio Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
GPT-4o Mini AudioFast128KText, Audio, ToolsText, Audio, Tools

$0.15-$

0.6/

$10-$

20
GPT-4o AudioMedium128KText, Audio, ToolsText, Audio, Tools

$2.5-$

10/

$40-$

80

Cost-Optimized Models

ModelSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
o4-miniMedium200KText, Images, ToolsText, Tools

$1.1-$

4.4
GPT-4.1-nanoVery fast1MText, Images, ToolsText, Tools

$0.1-$

0.4
GPT-4.1-miniFast1MText, Images, ToolsText, Tools

$0.4-$

1.6

Embedding Models

ModelSpeedDimensionsInput SupportPricing (per 1M tokens)
text-embedding-3-smallMedium1536Text$0.02
text-embedding-3-largeSlow3072Text$0.13
text-embedding-ada-002Slow1536Text$0.1

Media Content Support

Content TypeSupported FormatsMax SizeNotes
ImagesPNG, JPEG, WebP, GIF20MBBase64 encoded or URL
AudioWAV, MP325MBBase64 encoded only (audio models)
DocumentsPDF20MBBase64 encoded only (vision models)
Video❌ Not supported--

Important Details:

  • Images: Both URL and base64 supported

  • Audio: Only WAV and MP3 formats, base64 only

  • PDF Documents: Only PDF format, requires vision capability

  • Model Requirements: Audio needs Audio capability, PDF needs Vision.Image capability

Using in your project

Add the dependency to your project:

dependencies {
implementation("ai.koog.prompt:prompt-executor-openai-client:$version")
}

Configure the client with your API key:

val openaiClient = OpenAILLMClient(
apiKey = "your-openai-api-key",
)

Example of usage

suspend fun main() {
val client = OpenAILLMClient(
apiKey = System.getenv("OPENAI_API_KEY"),
)

// Text-only example
val response = client.execute(
prompt = prompt {
system("You are helpful assistant")
user("What time is it now?")
},
model = OpenAIModels.Chat.GPT4o
)

println(response)
}

Multimodal Examples

// Image analysis
val imageResponse = client.execute(
prompt = prompt {
user {
text("What do you see in this image?")
image("/path/to/image.jpg")
}
},
model = OpenAIModels.Chat.GPT4o
)

// Audio transcription (requires audio models)
val audioData = File("/path/to/audio.wav").readBytes()
val transcriptionResponse = client.execute(
prompt = prompt {
user {
text("Transcribe this audio")
audio(audioData, "wav")
}
},
model = OpenAIModels.Audio.GPT4oAudio
)

// PDF document processing (requires vision models)
val pdfResponse = client.execute(
prompt = prompt {
user {
text("Summarize this PDF document")
document("/path/to/document.pdf")
}
},
model = OpenAIModels.Chat.GPT4o
)

// Embedding example
val embedding = client.embed(
text = "This is a sample text for embedding",
model = OpenAIModels.Embeddings.TextEmbeddingAda3Small
)

// Mixed content (image + PDF)
val mixedResponse = client.execute(
prompt = prompt {
user {
text("Compare this image with the PDF:")
image("/path/to/chart.png")
document("/path/to/report.pdf")
text("What insights can you provide?")
}
},
model = OpenAIModels.Chat.GPT4o
)

Packages

Link copied to clipboard
common