--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation --- > [!WARNING] > **WARNING:** This is a language model that has undergone instruction tuning for conversational settings that exploit function calling capabilities. It has not been aligned with human preferences. As a result, it may generate outputs that are inappropriate, misleading, biased, or unsafe. These risks can be mitigated through additional post-training stages, which is strongly recommended before deployment in any production system, especially for high-stakes applications. > ### How to use ``` from datetime import datetime from transformers import AutoTokenizer, AutoModelForCausalLM import transformers import torch model_id = "BSC-LT/salamandra-7b-instruct" text = "What is the weather like in Paris today?" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.bfloat16 ) message = [ { "role": "user", "content": text } ] tools = [{ "type": "function", "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": False } }] prompt = tokenizer.apply_chat_template( message, tokenize=False, add_generation_prompt=True, tools=tools ) inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1000) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` #### Output: ```text {"name": "get_weather", "arguments": {"location": "Paris, France"}} ``` ### Deploy with vllm **Deploy the model using vllm docker image.** ``` docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HUGGING_FACE_HUB_TOKEN=" \ -p 80:80 \ vllm/vllm-openai:latest \ --model BSC-LT/salamandra-7b-instruct-tools \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --max_model_len 8196 \ --port 80 ``` **Then use it with openai api** ``` pip install openai ``` ``` from openai import OpenAI client = OpenAI( base_url="http://localhost:8080/v1/", api_key="hf_xxxx" ) models = client.models.list() model = models.data[0].id system_message = "" messages = [{ "role": "system", "content": system_message}] if system_message else [] messages.append( {"role":"user", "content": "What is the weather like in Paris today?"}) print(messages) chat_completion = client.chat.completions.create( model=model, tools=tools messages=messages, stream=False, max_tokens=1000, temperature=0.1, frequency_penalty=0.2, ) msg = chat_completion.choices[0].message # --- HANDLE TOOL CALL OR NORMAL CONTENT --- if not getattr(msg, "tool_calls", None): # Normal assistant message print(msg.content) messages.append({ "role": "assistant", "content": msg.content }) else: # Assistant tool call message print(msg.tool_calls) messages.append({"role": "assistant", "tool_calls": msg.tool_calls}) # --- Fake tool execution example --- tool_call = msg.tool_calls[0] # Example: handle the get_weather tool if tool_call.function.name == "get_weather": # Fake tool result (this would come from your actual backend) fake_tool_result = '{"temperature": 18, "unit": "C", "description": "Partly cloudy in Paris"}' # Append the tool result message so the model can use it in the next turn messages.append({ "role": "tool", "tool_call_id": tool_call.id, "name": tool_call.function.name, "content": fake_tool_result, }) ```