Qwen2.5-14B-Function-Calling-xLAM is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.
Key Features
High-Quality Fine-Tuning: Trained on N/A carefully curated examples
Efficient Training: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
Strong Performance: Achieves N/A token accuracy on evaluation set
Optimized for Inference: Available in multiple formats including GGUF quantizations
fromtransformersimportAutoModelForCausalLM,AutoTokenizerimporttorchmodel_id="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM"tokenizer=AutoTokenizer.from_pretrained(model_id)model=AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto")messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the sum of 2 + 2?"}]text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)inputs=tokenizer(text,return_tensors="pt").to(model.device)outputs=model.generate(**inputs,max_new_tokens=512,temperature=0.7,do_sample=True)response=tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],skip_special_tokens=True)print(response)
Using Pipeline
fromtransformersimportpipelinegenerator=pipeline("text-generation",model="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM",device_map="auto")messages=[{"role":"user","content":"Explain the concept of machine learning."}]output=generator(messages,max_new_tokens=256,return_full_text=False)print(output[0]["generated_text"])