Files
2025-10-09 16:47:16 +08:00

1.5 KiB

Utilities for Tokenizers

This page lists all the utility functions used by the tokenizers, mainly the class [~tokenization_utils_base.PreTrainedTokenizerBase] that implements the common methods between [PreTrainedTokenizer] and [PreTrainedTokenizerFast] and the mixin [~tokenization_utils_base.SpecialTokensMixin].

Most of those are only useful if you are studying the code of the tokenizers in the library.

PreTrainedTokenizerBase

autodoc tokenization_utils_base.PreTrainedTokenizerBase - call - all

SpecialTokensMixin

autodoc tokenization_utils_base.SpecialTokensMixin

Enums and namedtuples

autodoc tokenization_utils_base.TruncationStrategy

autodoc tokenization_utils_base.CharSpan

autodoc tokenization_utils_base.TokenSpan