--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - Azzurro --- Quantizations of https://huggingface.co/MoxoffSpA/Azzurro # From original readme ## Usage Be sure to install these dependencies before running the program ```python !pip install transformers torch sentencepiece ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cpu" # if you want to use the gpu make sure to have cuda toolkit installed and change this to "cuda" model = AutoModelForCausalLM.from_pretrained("MoxoffSpA/Azzurro") tokenizer = AutoTokenizer.from_pretrained("MoxoffSpA/Azzurro") question = """Quanto è alta la torre di Pisa?""" context = """ La Torre di Pisa è un campanile del XII secolo, famoso per la sua inclinazione. Alta circa 56 metri. """ prompt = f"Domanda: {question}, contesto: {context}" messages = [ {"role": "user", "content": prompt} ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate( model_inputs, # The input to the model max_new_tokens=128, # Limiting the maximum number of new tokens generated do_sample=True, # Enabling sampling to introduce randomness in the generation temperature=0.1, # Setting temperature to control the randomness, lower values make it more deterministic top_p=0.95, # Using nucleus sampling with top-p filtering for more coherent generation eos_token_id=tokenizer.eos_token_id # Specifying the token that indicates the end of a sequence ) decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True) trimmed_output = decoded_output.strip() print(trimmed_output) ```