Model: ermiaazarkhalili/Llama-3-8B-Instruct_Function_Calling_xLAM Source: Original Platform
220 lines
6.3 KiB
Markdown
220 lines
6.3 KiB
Markdown
---
|
|
license: llama3
|
|
language:
|
|
- en
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- metallama38binstruct
|
|
- sft
|
|
- fine-tuned
|
|
- trl
|
|
- lora
|
|
- text-generation
|
|
- conversational
|
|
- instruction-following
|
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
datasets:
|
|
- Salesforce/xlam-function-calling-60k
|
|
model-index:
|
|
- name: Llama-3-8B-Function-Calling-xLAM
|
|
results: []
|
|
---
|
|
|
|
# Llama-3-8B-Function-Calling-xLAM
|
|
|
|
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset using **SFT** with LoRA adapters.
|
|
|
|
## Overview
|
|
|
|
**Llama-3-8B-Function-Calling-xLAM** is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.
|
|
|
|
### Key Features
|
|
|
|
- **High-Quality Fine-Tuning**: Trained on N/A carefully curated examples
|
|
- **Efficient Training**: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
|
|
- **Strong Performance**: Achieves N/A token accuracy on evaluation set
|
|
- **Optimized for Inference**: Available in multiple formats including GGUF quantizations
|
|
|
|
## Model Details
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
|
|
| **License** | LLAMA3 |
|
|
| **Language** | English |
|
|
| **Base Model** | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
|
|
| **Model Size** | 8B parameters |
|
|
| **Tensor Type** | BF16 |
|
|
| **Context Length** | 2,048 tokens |
|
|
| **Training Method** | SFT with LoRA |
|
|
|
|
## Training Information
|
|
|
|
### Training Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| Learning Rate | 0.0002 |
|
|
| Batch Size | 2 per device |
|
|
| Gradient Accumulation Steps | 8 |
|
|
| Effective Batch Size | 16 |
|
|
| Number of Epochs | 1 |
|
|
| Max Sequence Length | 2,048 tokens |
|
|
| LR Scheduler | Linear warmup + Cosine annealing |
|
|
| Warmup Ratio | 0.1 |
|
|
| Precision | BF16 mixed precision |
|
|
| Gradient Checkpointing | Enabled |
|
|
| Random Seed | 42 |
|
|
|
|
### LoRA Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| LoRA Rank (r) | 64 |
|
|
| LoRA Alpha | 128 |
|
|
| LoRA Dropout | 0.05 |
|
|
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
| Quantization | 4-bit NF4 |
|
|
|
|
### Training Metrics
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Hardware | NVIDIA H100 MIG |
|
|
|
|
## Dataset
|
|
|
|
This model was trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset.
|
|
|
|
| Split | Samples |
|
|
|-------|---------|
|
|
| Training | N/A |
|
|
| Evaluation | N/A |
|
|
|
|
## Usage
|
|
|
|
### Quick Start
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
import torch
|
|
|
|
model_id = "ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM"
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_id,
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="auto"
|
|
)
|
|
|
|
messages = [
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": "What is the sum of 2 + 2?"}
|
|
]
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
|
|
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
|
print(response)
|
|
```
|
|
|
|
### Using Pipeline
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
generator = pipeline("text-generation", model="ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM", device_map="auto")
|
|
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
|
|
output = generator(messages, max_new_tokens=256, return_full_text=False)
|
|
print(output[0]["generated_text"])
|
|
```
|
|
|
|
### 4-bit Quantized Inference
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
|
import torch
|
|
|
|
quantization_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_compute_dtype=torch.bfloat16,
|
|
bnb_4bit_use_double_quant=True,
|
|
bnb_4bit_quant_type="nf4"
|
|
)
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
"ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM",
|
|
quantization_config=quantization_config,
|
|
device_map="auto"
|
|
)
|
|
```
|
|
|
|
## GGUF Versions
|
|
|
|
For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at:
|
|
[ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF](https://huggingface.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF)
|
|
|
|
### Using with Ollama
|
|
|
|
```bash
|
|
ollama pull hf.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF:Q4_K_M
|
|
ollama run hf.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!"
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- **Language**: Primarily trained on English data
|
|
- **Knowledge Cutoff**: Limited to base model's training data cutoff
|
|
- **Hallucinations**: May generate plausible-sounding but incorrect information
|
|
- **Context Length**: Fine-tuned with 2,048 token limit
|
|
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails
|
|
|
|
## Intended Use
|
|
|
|
### Recommended Uses
|
|
- Research on language model fine-tuning
|
|
- Educational purposes
|
|
- Personal projects
|
|
- Prototyping conversational AI
|
|
|
|
### Out-of-Scope Uses
|
|
- Production systems without additional safety measures
|
|
- Medical, legal, or financial advice
|
|
- Generating harmful or misleading content
|
|
|
|
## Training Framework
|
|
|
|
- **TRL**: 0.24.0
|
|
- **Transformers**: 4.57.3
|
|
- **PyTorch**: 2.9.0
|
|
- **Datasets**: 4.3.0
|
|
- **PEFT**: 0.18.0
|
|
- **BitsAndBytes**: 0.49.0
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@misc{ermiaazarkhalili_llama_3_8b_function_calling_xlam,
|
|
author = {ermiaazarkhalili},
|
|
title = {Llama-3-8B-Function-Calling-xLAM: Fine-tuned Meta-Llama-3-8B-Instruct on xlam-function-calling-60k},
|
|
year = {2026},
|
|
publisher = {Hugging Face},
|
|
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM}}
|
|
}
|
|
```
|
|
|
|
## Acknowledgments
|
|
|
|
- Base model developers at meta-llama
|
|
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
|
|
- Dataset creators and contributors
|
|
- Compute Canada / DRAC for HPC resources
|
|
|
|
## Contact
|
|
|
|
For questions or issues, please open an issue on the model repository.
|