Qwen2.5-14B-Instruct_Function_Calling_xLAM/README.md at main

ermiaazarkhalili/Qwen2.5-14B-Instruct_Function_Calling_xLAM

Files

ModelHub XC 1c83eece17 初始化项目，由ModelHub XC社区提供模型

Model: ermiaazarkhalili/Qwen2.5-14B-Instruct_Function_Calling_xLAM
Source: Original Platform

2026-05-09 18:56:18 +08:00

6.3 KiB

Raw Permalink Blame History

license, language, library_name, pipeline_tag, tags, base_model, datasets, model-index

license

language

library_name

pipeline_tag

Overview

Qwen2.5-14B-Function-Calling-xLAM is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.

Key Features

High-Quality Fine-Tuning: Trained on N/A carefully curated examples
Efficient Training: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
Strong Performance: Achieves N/A token accuracy on evaluation set
Optimized for Inference: Available in multiple formats including GGUF quantizations

Model Details

Property	Value
Developed by	ermiaazarkhalili
License	APACHE-2.0
Language	English
Base Model	Qwen/Qwen2.5-14B-Instruct
Model Size	14B parameters
Tensor Type	BF16
Context Length	2,048 tokens
Training Method	SFT with LoRA

Training Information

Training Configuration

Parameter	Value
Learning Rate	0.0002
Batch Size	2 per device
Gradient Accumulation Steps	8
Effective Batch Size	16
Number of Epochs	1
Max Sequence Length	2,048 tokens
LR Scheduler	Linear warmup + Cosine annealing
Warmup Ratio	0.1
Precision	BF16 mixed precision
Gradient Checkpointing	Enabled
Random Seed	42

LoRA Configuration

Parameter	Value
LoRA Rank (r)	64
LoRA Alpha	128
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization	4-bit NF4

Training Metrics

Metric	Value
Hardware	NVIDIA H100 MIG

Dataset

This model was trained on the Salesforce/xlam-function-calling-60k dataset.

Split	Samples
Training	N/A
Evaluation	N/A

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the sum of 2 + 2?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using Pipeline

from transformers import pipeline

generator = pipeline("text-generation", model="ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM", device_map="auto")
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM",
    quantization_config=quantization_config,
    device_map="auto"
)

GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!"

Limitations

Language: Primarily trained on English data
Knowledge Cutoff: Limited to base model's training data cutoff
Hallucinations: May generate plausible-sounding but incorrect information
Context Length: Fine-tuned with 2,048 token limit
Safety: Not extensively safety-tuned; use with appropriate guardrails

Intended Use

Recommended Uses

Research on language model fine-tuning
Educational purposes
Personal projects
Prototyping conversational AI

Out-of-Scope Uses

Production systems without additional safety measures
Medical, legal, or financial advice
Generating harmful or misleading content

Training Framework

TRL: 0.24.0
Transformers: 4.57.3
PyTorch: 2.9.0
Datasets: 4.3.0
PEFT: 0.18.0
BitsAndBytes: 0.49.0

Citation

@misc{ermiaazarkhalili_qwen2.5_14b_function_calling_xlam,
    author = {ermiaazarkhalili},
    title = {Qwen2.5-14B-Function-Calling-xLAM: Fine-tuned Qwen2.5-14B-Instruct on xlam-function-calling-60k},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen2.5-14B-Function-Calling-xLAM}}
}

Acknowledgments

Base model developers at Qwen
Hugging Face TRL Team for the training library
Dataset creators and contributors
Compute Canada / DRAC for HPC resources

Contact

For questions or issues, please open an issue on the model repository.

6.3 KiB Raw Permalink Blame History

Qwen2.5-14B-Function-Calling-xLAM