Package-level declarations
Types
Link copied to clipboard
Represents a key based on a byte array. This class is primarily designed to be used as a key in collections such as Maps due to the proper implementation of equals
and hashCode
methods based on the byte array's content.
Link copied to clipboard
open class TiktokenEncoder(vocabulary: Map<ByteArrayKey, Int>, pattern: Regex, unkTokenId: Int) : Tokenizer
A tokenizer implementation that uses a token encoding vocabulary and a regex pattern to tokenize text into a series of token IDs. The tokenization process utilizes byte pair encoding (BPE) for segments of text not directly found in the vocabulary.