Files
2025-10-09 16:47:16 +08:00

7.7 KiB

Utilities for Generation

This page lists all the utility functions used by [~generation.GenerationMixin.generate].

Generate Outputs

The output of [~generation.GenerationMixin.generate] is an instance of a subclass of [~utils.ModelOutput]. This output is a data structure containing all the information returned by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.

Here's an example:

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")

inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)

The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can see in the documentation of that class below, it means it has the following attributes:

  • sequences: the generated sequences of tokens
  • scores (optional): the prediction scores of the language modelling head, for each generation step
  • hidden_states (optional): the hidden states of the model, for each generation step
  • attentions (optional): the attention weights of the model, for each generation step

Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and attentions because we didn't pass output_hidden_states=True or output_attentions=True.

You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get None. Here for instance generation_output.scores are all the generated prediction scores of the language modeling head, and generation_output.attentions is None.

When using our generation_output object as a tuple, it only keeps the attributes that don't have None values. Here, for instance, it has two elements, loss then logits, so

generation_output[:2]

will return the tuple (generation_output.sequences, generation_output.scores) for instance.

When using our generation_output object as a dictionary, it only keeps the attributes that don't have None values. Here, for instance, it has two keys that are sequences and scores.

We document here all output types.

autodoc generation.GenerateDecoderOnlyOutput

autodoc generation.GenerateEncoderDecoderOutput

autodoc generation.GenerateBeamDecoderOnlyOutput

autodoc generation.GenerateBeamEncoderDecoderOutput

LogitsProcessor

A [LogitsProcessor] can be used to modify the prediction scores of a language model head for generation.

autodoc AlternatingCodebooksLogitsProcessor - call

autodoc ClassifierFreeGuidanceLogitsProcessor - call

autodoc EncoderNoRepeatNGramLogitsProcessor - call

autodoc EncoderRepetitionPenaltyLogitsProcessor - call

autodoc EpsilonLogitsWarper - call

autodoc EtaLogitsWarper - call

autodoc ExponentialDecayLengthPenalty - call

autodoc ForcedBOSTokenLogitsProcessor - call

autodoc ForcedEOSTokenLogitsProcessor - call

autodoc InfNanRemoveLogitsProcessor - call

autodoc LogitNormalization - call

autodoc LogitsProcessor - call

autodoc LogitsProcessorList - call

autodoc MinLengthLogitsProcessor - call

autodoc MinNewTokensLengthLogitsProcessor - call

autodoc MinPLogitsWarper - call

autodoc NoBadWordsLogitsProcessor - call

autodoc NoRepeatNGramLogitsProcessor - call

autodoc PrefixConstrainedLogitsProcessor - call

autodoc RepetitionPenaltyLogitsProcessor - call

autodoc SequenceBiasLogitsProcessor - call

autodoc SuppressTokensAtBeginLogitsProcessor - call

autodoc SuppressTokensLogitsProcessor - call

autodoc SynthIDTextWatermarkLogitsProcessor - call

autodoc TemperatureLogitsWarper - call

autodoc TopKLogitsWarper - call

autodoc TopPLogitsWarper - call

autodoc TypicalLogitsWarper - call

autodoc UnbatchedClassifierFreeGuidanceLogitsProcessor - call

autodoc WhisperTimeStampLogitsProcessor - call

autodoc WatermarkLogitsProcessor - call

StoppingCriteria

A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.

autodoc StoppingCriteria - call

autodoc StoppingCriteriaList - call

autodoc MaxLengthCriteria - call

autodoc MaxTimeCriteria - call

autodoc StopStringCriteria - call

autodoc EosTokenCriteria - call

Constraints

A [Constraint] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations.

autodoc Constraint

autodoc PhrasalConstraint

autodoc DisjunctiveConstraint

autodoc ConstraintListState

BeamSearch

autodoc BeamScorer - process - finalize

autodoc ConstrainedBeamSearchScorer - process - finalize

Streamers

autodoc TextStreamer

autodoc TextIteratorStreamer

autodoc AsyncTextIteratorStreamer

Caches

autodoc CacheLayerMixin - update - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - lazy_initialization

autodoc DynamicLayer - update - lazy_initialization - crop - batch_repeat_interleave - batch_select_indices

autodoc StaticLayer - update - lazy_initialization

autodoc StaticSlidingWindowLayer - update - lazy_initialization

autodoc QuantoQuantizedLayer - update - lazy_initialization

autodoc HQQQuantizedLayer - update - lazy_initialization

autodoc Cache - update - early_initialization - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - crop - batch_repeat_interleave - batch_select_indices

autodoc DynamicCache - to_legacy_cache - from_legacy_cache

autodoc QuantizedCache

autodoc QuantoQuantizedCache

autodoc HQQQuantizedCache

autodoc OffloadedCache

autodoc StaticCache

autodoc OffloadedStaticCache

autodoc HybridCache

autodoc HybridChunkedCache

autodoc SlidingWindowCache

autodoc EncoderDecoderCache - to_legacy_cache - from_legacy_cache

Watermark Utils

autodoc WatermarkingConfig - call

autodoc WatermarkDetector - call

autodoc BayesianDetectorConfig

autodoc BayesianDetectorModel - forward

autodoc SynthIDTextWatermarkingConfig

autodoc SynthIDTextWatermarkDetector - call

Compile Utils

autodoc CompileConfig - call