prompt-executor-google-client

A client implementation for executing prompts using Google Gemini models with comprehensive multimodal support.

Overview

This module provides a client implementation for the Google Gemini API, allowing you to execute prompts using Gemini models. It handles authentication, request formatting, response parsing, and multimodal content encoding specific to Google's API requirements. This client offers the most comprehensive multimodal support among all providers.

Supported Models

NameSpeedContextInput SupportOutput SupportPricing (per 1M tokens)
Gemini 2.0 FlashFast1MAudio, Image, Video, Text, ToolsText, Tools

$0.10-$

0.70 / $0.40
Gemini 2.0 Flash-001Fast1MAudio, Image, Video, Text, ToolsText, Tools

$0.10-$

0.70 / $0.40
Gemini 2.0 Flash-LiteVery fast1MAudio, Image, Video, TextText

$0.075 / $

0.30
Gemini 1.5 ProMedium1MAudio, Image, Video, Text, ToolsText, Tools

$1.25-$

2.50 /

$5.00-$

10.00
Gemini 1.5 Pro LatestMedium1MAudio, Image, Video, Text, ToolsText, Tools

$1.25-$

2.50 /

$5.00-$

10.00
Gemini 1.5 Pro-001Medium1MAudio, Image, Video, Text, ToolsText, Tools

$1.25-$

2.50 /

$5.00-$

10.00
Gemini 1.5 Pro-002Medium1MAudio, Image, Video, Text, ToolsText, Tools

$1.25-$

2.50 /

$5.00-$

10.00
Gemini 1.5 FlashFast1MAudio, Image, Video, Text, ToolsText, Tools

$0.075-$

0.15 /

$0.30-$

0.60
Gemini 1.5 Flash LatestFast1MAudio, Image, Video, TextText

$0.075-$

0.15 /

$0.30-$

0.60
Gemini 1.5 Flash-001Fast1MAudio, Image, Video, TextText

$0.075-$

0.15 /

$0.30-$

0.60
Gemini 1.5 Flash-002Fast1MAudio, Image, Video, TextText

$0.075-$

0.15 /

$0.30-$

0.60
Gemini 1.5 Flash 8BVery fast1MAudio, Image, Video, TextText

$0.0375-$

0.075 /

$0.15-$

0.30
Gemini 1.5 Flash 8B LatestVery fast1MAudio, Image, Video, TextText

$0.0375-$

0.075 /

$0.15-$

0.30
Gemini 2.5 Pro PreviewSlow1MAudio, Image, Video, Text, ToolsText

$1.25-$

2.50 /

$10.00-$

15.00
Gemini 2.5 Flash PreviewMedium1MAudio, Image, Video, TextText

$0.15-$

1.00 /

$0.60-$

3.50

Media Content Support

Content TypeSupported FormatsMax SizeNotes
ImagesPNG, JPEG, WebP, HEIC, HEIF20MBBase64 encoded only (no URLs)
AudioWAV, MP3, AIFF, AAC, OGG, FLAC20MBBase64 encoded, for transcription/analysis
VideoAll formats via MIME type20MBBase64 encoded, video analysis
DocumentsAll formats via MIME type20MBBase64 encoded, content passed directly

Important Limitations:

  • No URL support: All media must be provided as base64-encoded data

  • Documents: Passed directly to model (no text extraction)

  • File validation: Format checked via MIME type detection

Using in your project

Add the dependency to your project:

dependencies {
implementation("ai.koog.prompt:prompt-executor-google-client:$version")
}

Configure the client with your API key:

val googleClient = GoogleLLMClient(
apiKey = "your-google-api-key",
)

Example of usage

suspend fun main() {
val client = GoogleLLMClient(
apiKey = System.getenv("GEMINI_API_KEY"),
)

// Text-only example
val response = client.execute(
prompt = prompt {
system("You are helpful assistant")
user("What time is it now?")
},
model = GoogleModels.Gemini2_0Flash
)

println(response)
}

Multimodal Examples

// Image analysis
val imageResponse = client.execute(
prompt = prompt {
user {
text("What do you see in this image?")
image("/path/to/image.jpg")
}
},
model = GoogleModels.Gemini2_0Flash
)

// Video analysis
val videoData = File("/path/to/video.mp4").readBytes()
val videoResponse = client.execute(
prompt = prompt {
user {
text("Describe what happens in this video")
video(videoData, "mp4")
}
},
model = GoogleModels.Gemini1_5Pro
)

// Audio transcription
val audioData = File("/path/to/audio.wav").readBytes()
val audioResponse = client.execute(
prompt = prompt {
user {
text("Transcribe and analyze this audio")
audio(audioData, "wav")
}
},
model = GoogleModels.Gemini1_5Pro
)

// Document processing
val documentResponse = client.execute(
prompt = prompt {
user {
text("Summarize this document")
document("/path/to/document.pdf")
}
},
model = GoogleModels.Gemini2_0Flash
)

// All media types combined
val comprehensiveResponse = client.execute(
prompt = prompt {
user {
text("Analyze all this content and find connections:")
image("/path/to/chart.png")
video(videoData, "mp4")
audio(audioData, "wav")
document("/path/to/report.pdf")
text("What insights can you provide?")
}
},
model = GoogleModels.Gemini1_5Pro
)

Packages

Link copied to clipboard
common