ServiceTier
Service tier used to process a request.
Behavior overview:
AUTO
— Defers to the Project’s configured service tier (which, unless changed, isDEFAULT
).DEFAULT
— Standard pricing and performance for the selected model.FLEX
— Lower cost in exchange for slower responses and occasional resource unavailability; suited for non-production / lower-priority tasks such as evaluations, data enrichment, and asynchronous workloads. Tokens are billed at Batch-API-like rates and can benefit from prompt-cache discounts.PRIORITY
— Reliable, high-speed processing with predictably low latency, even during peak demand; available on a flexible, pay-as-you-go basis without advance provisioning.
Note: When a tier is requested, the response payload includes the service_tier
actually used to serve the request. This value may differ from the one provided in the request.
Entries
Use the standard pricing and performance for the selected model. Serialized as "default"
.
Lower-cost, opportunistic processing that may be slower and occasionally unavailable; ideal for non-production or background workloads. Tokens are priced similarly to Batch API with additional savings from prompt caching. Serialized as "flex"
.
High-speed, reliable processing with predictably low latency, including during peak demand; available pay-as-you-go without pre-provisioning. Serialized as "priority"
.
Properties
Functions
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
Returns an array containing the constants of this enum type, in the order they're declared.