Llama-3-8B-Instruct_Functio…/README.md

---
license: llama3
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - metallama38binstruct
  - sft
  - fine-tuned
  - trl
  - lora
  - text-generation
  - conversational
  - instruction-following
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
  - Salesforce/xlam-function-calling-60k
model-index:
  - name: Llama-3-8B-Function-Calling-xLAM
    results: []
---

# Llama-3-8B-Function-Calling-xLAM

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset using **SFT** with LoRA adapters.

## Overview

**Llama-3-8B-Function-Calling-xLAM** is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.

### Key Features

- **High-Quality Fine-Tuning**: Trained on N/A carefully curated examples
- **Efficient Training**: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
- **Strong Performance**: Achieves N/A token accuracy on evaluation set
- **Optimized for Inference**: Available in multiple formats including GGUF quantizations

## Model Details

| Property | Value |
|----------|-------|
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
| **License** | LLAMA3 |
| **Language** | English |
| **Base Model** | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
| **Model Size** | 8B parameters |
| **Tensor Type** | BF16 |
| **Context Length** | 2,048 tokens |
| **Training Method** | SFT with LoRA |

## Training Information

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Learning Rate | 0.0002 |
| Batch Size | 2 per device |
| Gradient Accumulation Steps | 8 |
| Effective Batch Size | 16 |
| Number of Epochs | 1 |
| Max Sequence Length | 2,048 tokens |
| LR Scheduler | Linear warmup + Cosine annealing |
| Warmup Ratio | 0.1 |
| Precision | BF16 mixed precision |
| Gradient Checkpointing | Enabled |
| Random Seed | 42 |

### LoRA Configuration

| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 |

### Training Metrics

| Metric | Value |
|--------|-------|
| Hardware | NVIDIA H100 MIG |

## Dataset

This model was trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset.

| Split | Samples |
|-------|---------|
| Training | N/A |
| Evaluation | N/A |

## Usage

### Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the sum of 2 + 2?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
```

### Using Pipeline

```python
from transformers import pipeline

generator = pipeline("text-generation", model="ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM", device_map="auto")
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])
```

### 4-bit Quantized Inference

```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM",
    quantization_config=quantization_config,
    device_map="auto"
)
```

## GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at:
[ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF](https://huggingface.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF)

### Using with Ollama

```bash
ollama pull hf.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM-GGUF:Q4_K_M "Hello!"
```

## Limitations

- **Language**: Primarily trained on English data
- **Knowledge Cutoff**: Limited to base model's training data cutoff
- **Hallucinations**: May generate plausible-sounding but incorrect information
- **Context Length**: Fine-tuned with 2,048 token limit
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails

## Intended Use

### Recommended Uses
- Research on language model fine-tuning
- Educational purposes
- Personal projects
- Prototyping conversational AI

### Out-of-Scope Uses
- Production systems without additional safety measures
- Medical, legal, or financial advice
- Generating harmful or misleading content

## Training Framework

- **TRL**: 0.24.0
- **Transformers**: 4.57.3
- **PyTorch**: 2.9.0
- **Datasets**: 4.3.0
- **PEFT**: 0.18.0
- **BitsAndBytes**: 0.49.0

## Citation

```bibtex
@misc{ermiaazarkhalili_llama_3_8b_function_calling_xlam,
    author = {ermiaazarkhalili},
    title = {Llama-3-8B-Function-Calling-xLAM: Fine-tuned Meta-Llama-3-8B-Instruct on xlam-function-calling-60k},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Llama-3-8B-Function-Calling-xLAM}}
}
```

## Acknowledgments

- Base model developers at meta-llama
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
- Dataset creators and contributors
- Compute Canada / DRAC for HPC resources

## Contact

For questions or issues, please open an issue on the model repository.