--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - qwen2514binstruct - sft - fine-tuned - trl - lora - text-generation - conversational - instruction-following base_model: Qwen/Qwen2.5-14B-Instruct datasets: - Salesforce/xlam-function-calling-60k model-index: - name: Qwen2.5-14B-Function-Calling-xLAM results: [] --- # Qwen2.5-14B-Function-Calling-xLAM This model is a fine-tuned version of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset using **SFT** with LoRA adapters. ## Overview **Qwen2.5-14B-Function-Calling-xLAM** is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data. ### Key Features - **High-Quality Fine-Tuning**: Trained on N/A carefully curated examples - **Efficient Training**: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization - **Strong Performance**: Achieves N/A token accuracy on evaluation set - **Optimized for Inference**: Available in multiple formats including GGUF quantizations ## Model Details | Property | Value | |----------|-------| | **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) | | **License** | APACHE-2.0 | | **Language** | English | | **Base Model** | [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | | **Model Size** | 14B parameters | | **Tensor Type** | BF16 | | **Context Length** | 2,048 tokens | | **Training Method** | SFT with LoRA | ## Training Information ### Training Configuration | Parameter | Value | |-----------|-------| | Learning Rate | 0.0002 | | Batch Size | 2 per device | | Gradient Accumulation Steps | 8 | | Effective Batch Size | 16 | | Number of Epochs | 1 | | Max Sequence Length | 2,048 tokens | | LR Scheduler | Linear warmup + Cosine annealing | | Warmup Ratio | 0.1 | | Precision | BF16 mixed precision | | Gradient Checkpointing | Enabled | | Random Seed | 42 | ### LoRA Configuration | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 64 | | LoRA Alpha | 128 | | LoRA Dropout | 0.05 | | Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Quantization | 4-bit NF4 | ### Training Metrics | Metric | Value | |--------|-------| | Hardware | NVIDIA H100 MIG | ## Dataset This model was trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset. | Split | Samples | |-------|---------| | Training | N/A | | Evaluation | N/A | ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the sum of 2 + 2?"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True) response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) print(response) ``` ### Using Pipeline ```python from transformers import pipeline generator = pipeline("text-generation", model="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM", device_map="auto") messages = [{"role": "user", "content": "Explain the concept of machine learning."}] output = generator(messages, max_new_tokens=256, return_full_text=False) print(output[0]["generated_text"]) ``` ### 4-bit Quantized Inference ```python from transformers import AutoModelForCausalLM, BitsAndBytesConfig import torch quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" ) model = AutoModelForCausalLM.from_pretrained( "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM", quantization_config=quantization_config, device_map="auto" ) ``` ## GGUF Versions For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: [ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF) ### Using with Ollama ```bash ollama pull hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M ollama run hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!" ``` ## Limitations - **Language**: Primarily trained on English data - **Knowledge Cutoff**: Limited to base model's training data cutoff - **Hallucinations**: May generate plausible-sounding but incorrect information - **Context Length**: Fine-tuned with 2,048 token limit - **Safety**: Not extensively safety-tuned; use with appropriate guardrails ## Intended Use ### Recommended Uses - Research on language model fine-tuning - Educational purposes - Personal projects - Prototyping conversational AI ### Out-of-Scope Uses - Production systems without additional safety measures - Medical, legal, or financial advice - Generating harmful or misleading content ## Training Framework - **TRL**: 0.24.0 - **Transformers**: 4.57.3 - **PyTorch**: 2.9.0 - **Datasets**: 4.3.0 - **PEFT**: 0.18.0 - **BitsAndBytes**: 0.49.0 ## Citation ```bibtex @misc{ermiaazarkhalili_qwen2.5_14b_function_calling_xlam, author = {ermiaazarkhalili}, title = {Qwen2.5-14B-Function-Calling-xLAM: Fine-tuned Qwen2.5-14B-Instruct on xlam-function-calling-60k}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM}} } ``` ## Acknowledgments - Base model developers at Qwen - [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library - Dataset creators and contributors - Compute Canada / DRAC for HPC resources ## Contact For questions or issues, please open an issue on the model repository.