SimpleRegexBasedTokenizer

A simple regex-based tokenizer that splits text on whitespace and common punctuation.

This tokenizer provides a reasonable approximation of token counts for most LLMs, though it's not as accurate as model-specific tokenizers. It's efficient and doesn't require any external dependencies.

Note: Ollama does not provide tokens in responses, so this client-side estimation is necessary for token counting.

Constructors

Link copied to clipboard
constructor()

Functions

Link copied to clipboard
open override fun countTokens(text: String): Int

Counts tokens by splitting on whitespace and common punctuation.