llama3-8b-pokerbench-sft/README.md

---
license: llama3
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
tags:
- poker
- game-theory
- fine-tuned
- sft
datasets:
- RZ412/PokerBench
language:
- en
pipeline_tag: text-generation
---

# Llama 3 8B - PokerBench SFT

Fine-tuned Llama 3.1 8B Instruct for poker decision-making using LoRA, trained on PokerBench dataset.

## Training Details

- **Base Model**: Meta-Llama-3.1-8B-Instruct
- **Training Data**: PokerBench (RZ412/PokerBench)
- **Method**: LoRA fine-tuning (merged)
- **Training Steps**: 5,000
- **Batch Size**: 128
- **Learning Rate**: 1e-6

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
tokenizer = AutoTokenizer.from_pretrained("YiPz/llama3-8b-pokerbench-sft")

messages = [
    {"role": "system", "content": "You are an expert poker player. Respond with your action in <action></action> tags."},
    {"role": "user", "content": "Your poker scenario..."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=32, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Output Format

Actions are returned in `<action></action>` tags:
- `<action>fold</action>`
- `<action>call</action>`
- `<action>check</action>`
- `<action>raise 15</action>`
- `<action>bet 10</action>`

## GGUF Versions

Quantized GGUF versions for llama.cpp/Ollama: [YiPz/llama3-8b-pokerbench-sft-gguf](https://huggingface.co/YiPz/llama3-8b-pokerbench-sft-gguf)

## License

Subject to Llama 3 license.