Package-level declarations

Types

Link copied to clipboard
class ByteArrayKey(bytes: ByteArray)

Represents a key based on a byte array. This class is primarily designed to be used as a key in collections such as Maps due to the proper implementation of equals and hashCode methods based on the byte array's content.

Link copied to clipboard
open class TiktokenEncoder(vocabulary: Map<ByteArrayKey, Int>, pattern: Regex, unkTokenId: Int) : Tokenizer

A tokenizer implementation that uses a token encoding vocabulary and a regex pattern to tokenize text into a series of token IDs. The tokenization process utilizes byte pair encoding (BPE) for segments of text not directly found in the vocabulary.

Functions

Link copied to clipboard

Converts the current string into a ByteArrayKey by encoding the string into a UTF-8 byte array.