初始化项目,由ModelHub XC社区提供模型
Model: YiPz/llama3-8b-pokerbench-sft Source: Original Platform
This commit is contained in:
62
README.md
Normal file
62
README.md
Normal file
@@ -0,0 +1,62 @@
|
||||
---
|
||||
license: llama3
|
||||
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
||||
tags:
|
||||
- poker
|
||||
- game-theory
|
||||
- fine-tuned
|
||||
- sft
|
||||
datasets:
|
||||
- RZ412/PokerBench
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Llama 3 8B - PokerBench SFT
|
||||
|
||||
Fine-tuned Llama 3.1 8B Instruct for poker decision-making using LoRA, trained on PokerBench dataset.
|
||||
|
||||
## Training Details
|
||||
|
||||
- **Base Model**: Meta-Llama-3.1-8B-Instruct
|
||||
- **Training Data**: PokerBench (RZ412/PokerBench)
|
||||
- **Method**: LoRA fine-tuning (merged)
|
||||
- **Training Steps**: 5,000
|
||||
- **Batch Size**: 128
|
||||
- **Learning Rate**: 1e-6
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
|
||||
tokenizer = AutoTokenizer.from_pretrained("YiPz/llama3-8b-pokerbench-sft")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are an expert poker player. Respond with your action in <action></action> tags."},
|
||||
{"role": "user", "content": "Your poker scenario..."}
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
|
||||
outputs = model.generate(inputs, max_new_tokens=32, temperature=0.1)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
Actions are returned in `<action></action>` tags:
|
||||
- `<action>fold</action>`
|
||||
- `<action>call</action>`
|
||||
- `<action>check</action>`
|
||||
- `<action>raise 15</action>`
|
||||
- `<action>bet 10</action>`
|
||||
|
||||
## GGUF Versions
|
||||
|
||||
Quantized GGUF versions for llama.cpp/Ollama: [YiPz/llama3-8b-pokerbench-sft-gguf](https://huggingface.co/YiPz/llama3-8b-pokerbench-sft-gguf)
|
||||
|
||||
## License
|
||||
|
||||
Subject to Llama 3 license.
|
||||
Reference in New Issue
Block a user