TiktokenEncoder
Initializes the TiktokenEncoder with the specified vocabulary, regex pattern for matching tokens, and a token ID for unknown tokens.
Parameters
vocabulary
A mapping of ByteArrayKey to token ID that represents the token encoding vocabulary.
pattern
A regular expression used to match text patterns for initial token matching.
unkTokenId
The token ID used for unknown tokens not present in the vocabulary.