7.7 KiB
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate].
Generate Outputs
The output of [~generation.GenerationMixin.generate] is an instance of a subclass of
[~utils.ModelOutput]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences: the generated sequences of tokensscores(optional): the prediction scores of the language modelling head, for each generation stephidden_states(optional): the hidden states of the model, for each generation stepattentions(optional): the attention weights of the model, for each generation step
Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and
attentions because we didn't pass output_hidden_states=True or output_attentions=True.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None. Here for instance generation_output.scores are all the generated prediction scores of the
language modeling head, and generation_output.attentions is None.
When using our generation_output object as a tuple, it only keeps the attributes that don't have None values.
Here, for instance, it has two elements, loss then logits, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores) for instance.
When using our generation_output object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences and scores.
We document here all output types.
autodoc generation.GenerateDecoderOnlyOutput
autodoc generation.GenerateEncoderDecoderOutput
autodoc generation.GenerateBeamDecoderOnlyOutput
autodoc generation.GenerateBeamEncoderDecoderOutput
LogitsProcessor
A [LogitsProcessor] can be used to modify the prediction scores of a language model head for
generation.
autodoc AlternatingCodebooksLogitsProcessor - call
autodoc ClassifierFreeGuidanceLogitsProcessor - call
autodoc EncoderNoRepeatNGramLogitsProcessor - call
autodoc EncoderRepetitionPenaltyLogitsProcessor - call
autodoc EpsilonLogitsWarper - call
autodoc EtaLogitsWarper - call
autodoc ExponentialDecayLengthPenalty - call
autodoc ForcedBOSTokenLogitsProcessor - call
autodoc ForcedEOSTokenLogitsProcessor - call
autodoc InfNanRemoveLogitsProcessor - call
autodoc LogitNormalization - call
autodoc LogitsProcessor - call
autodoc LogitsProcessorList - call
autodoc MinLengthLogitsProcessor - call
autodoc MinNewTokensLengthLogitsProcessor - call
autodoc MinPLogitsWarper - call
autodoc NoBadWordsLogitsProcessor - call
autodoc NoRepeatNGramLogitsProcessor - call
autodoc PrefixConstrainedLogitsProcessor - call
autodoc RepetitionPenaltyLogitsProcessor - call
autodoc SequenceBiasLogitsProcessor - call
autodoc SuppressTokensAtBeginLogitsProcessor - call
autodoc SuppressTokensLogitsProcessor - call
autodoc SynthIDTextWatermarkLogitsProcessor - call
autodoc TemperatureLogitsWarper - call
autodoc TopKLogitsWarper - call
autodoc TopPLogitsWarper - call
autodoc TypicalLogitsWarper - call
autodoc UnbatchedClassifierFreeGuidanceLogitsProcessor - call
autodoc WhisperTimeStampLogitsProcessor - call
autodoc WatermarkLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
autodoc StoppingCriteria - call
autodoc StoppingCriteriaList - call
autodoc MaxLengthCriteria - call
autodoc MaxTimeCriteria - call
autodoc StopStringCriteria - call
autodoc EosTokenCriteria - call
Constraints
A [Constraint] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusively available to our PyTorch implementations.
autodoc Constraint
autodoc PhrasalConstraint
autodoc DisjunctiveConstraint
autodoc ConstraintListState
BeamSearch
autodoc BeamScorer - process - finalize
autodoc ConstrainedBeamSearchScorer - process - finalize
Streamers
autodoc TextStreamer
autodoc TextIteratorStreamer
autodoc AsyncTextIteratorStreamer
Caches
autodoc CacheLayerMixin - update - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - lazy_initialization
autodoc DynamicLayer - update - lazy_initialization - crop - batch_repeat_interleave - batch_select_indices
autodoc StaticLayer - update - lazy_initialization
autodoc StaticSlidingWindowLayer - update - lazy_initialization
autodoc QuantoQuantizedLayer - update - lazy_initialization
autodoc HQQQuantizedLayer - update - lazy_initialization
autodoc Cache - update - early_initialization - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - crop - batch_repeat_interleave - batch_select_indices
autodoc DynamicCache - to_legacy_cache - from_legacy_cache
autodoc QuantizedCache
autodoc QuantoQuantizedCache
autodoc HQQQuantizedCache
autodoc OffloadedCache
autodoc StaticCache
autodoc OffloadedStaticCache
autodoc HybridCache
autodoc HybridChunkedCache
autodoc SlidingWindowCache
autodoc EncoderDecoderCache - to_legacy_cache - from_legacy_cache
Watermark Utils
autodoc WatermarkingConfig - call
autodoc WatermarkDetector - call
autodoc BayesianDetectorConfig
autodoc BayesianDetectorModel - forward
autodoc SynthIDTextWatermarkingConfig
autodoc SynthIDTextWatermarkDetector - call
Compile Utils
autodoc CompileConfig - call