--- license: llama3 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct tags: - poker - game-theory - fine-tuned - sft datasets: - RZ412/PokerBench language: - en pipeline_tag: text-generation --- # Llama 3 8B - PokerBench SFT Fine-tuned Llama 3.1 8B Instruct for poker decision-making using LoRA, trained on PokerBench dataset. ## Training Details - **Base Model**: Meta-Llama-3.1-8B-Instruct - **Training Data**: PokerBench (RZ412/PokerBench) - **Method**: LoRA fine-tuning (merged) - **Training Steps**: 5,000 - **Batch Size**: 128 - **Learning Rate**: 1e-6 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("YiPz/llama3-8b-pokerbench-sft") tokenizer = AutoTokenizer.from_pretrained("YiPz/llama3-8b-pokerbench-sft") messages = [ {"role": "system", "content": "You are an expert poker player. Respond with your action in tags."}, {"role": "user", "content": "Your poker scenario..."} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) outputs = model.generate(inputs, max_new_tokens=32, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Output Format Actions are returned in `` tags: - `fold` - `call` - `check` - `raise 15` - `bet 10` ## GGUF Versions Quantized GGUF versions for llama.cpp/Ollama: [YiPz/llama3-8b-pokerbench-sft-gguf](https://huggingface.co/YiPz/llama3-8b-pokerbench-sft-gguf) ## License Subject to Llama 3 license.