SimpleRegexBasedTokenizer
A simple regex-based tokenizer that splits text on whitespace and common punctuation.
This tokenizer provides a reasonable approximation of token counts for most LLMs, though it's not as accurate as model-specific tokenizers. It's efficient and doesn't require any external dependencies.
Note: Ollama does not provide tokens in responses, so this client-side estimation is necessary for token counting.