--- library_name: transformers license: other license_name: nvidia-open-model-license license_link: >- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: text-generation language: - en tags: - nvidia - llama-3 - pytorch --- # Llama-3.1-Nemotron-Nano-8B-v1 Note: chat template forces reasoning to be on via the system prompt! Any additional system prompt will throw an error. Example: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "cminst/Llama-Nemotron-8B-templatefixes" # Load tokenizer + override chat template tokenizer = AutoTokenizer.from_pretrained(model_id) # ---- Test conversation ---- messages = [ {"role": "user", "content": "Solve x*(sin(x)+2)=0"} ] # Apply template inputs = tokenizer.apply_chat_template( messages, tokenize=False, return_tensors="pt", add_generation_prompt=True ) print("START") print(inputs,end="") print("END") ``` gives: ``` START <|begin_of_text|><|start_header_id|>system<|end_header_id|> detailed thinking on<|eot_id|><|start_header_id|>user<|end_header_id|> Solve x*(sin(x)+2)=0<|eot_id|><|start_header_id|>assistant<|end_header_id|> END ```