初始化项目,由ModelHub XC社区提供模型
Model: GoldenGrapeGentleman1/pokemon-showdown-agent-v6 Source: Original Platform
This commit is contained in:
176
README.md
Normal file
176
README.md
Normal file
@@ -0,0 +1,176 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-4B
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- unsloth
|
||||
- trl
|
||||
- sft
|
||||
- qwen3
|
||||
- pokemon-showdown
|
||||
- game-ai
|
||||
- rocm
|
||||
- amd
|
||||
language:
|
||||
- en
|
||||
---
|
||||
|
||||
# Pokemon Showdown Agent v6
|
||||
|
||||
`Pokemon Showdown Agent v6` is a `Qwen/Qwen3-4B` fine-tune for **next-action prediction from raw Pokemon Showdown replay logs**. Given a battle-log prefix and the side it controls, the model is trained to emit a short action command such as `move Earthquake` or `switch Corviknight`.
|
||||
|
||||
This release is the merged checkpoint from the `v6` pipeline built with **Unsloth + TRL + AMD ROCm**. The tutorial version of the workflow uses a much smaller streamed subset for fast reproduction; this model is the larger production-oriented artifact.
|
||||
|
||||
## What makes v6 different
|
||||
|
||||
- It learns directly from messy raw replay logs instead of hand-written state summaries.
|
||||
- It targets a strict action format suitable for agent pipelines: `move [move-name]` or `switch [pokemon-name]`.
|
||||
- It was developed around AMD ROCm workflows, with `bfloat16` recommended for stable inference.
|
||||
|
||||
## Official notebook
|
||||
|
||||
Use the cleaned release notebook `pokemon_showdown_agent_v6_release.ipynb` for the reproducible tutorial flow.
|
||||
|
||||
## Intended use
|
||||
|
||||
Use this model when you want to:
|
||||
|
||||
- Predict the next action from a raw Pokemon Showdown log prefix.
|
||||
- Build a text-only battle agent or evaluation harness.
|
||||
- Study agent alignment from real replay trajectories.
|
||||
|
||||
This model is **not** a full simulator policy by itself. For ladder play or automated battle loops, you still need legality checks, environment wrappers, and battle-state management outside the model.
|
||||
|
||||
## Prompt format
|
||||
|
||||
The model expects a chat-style prompt with:
|
||||
|
||||
- A `system` message specifying which side the model is playing as.
|
||||
- A `user` message containing the raw replay log prefix up to the current turn marker.
|
||||
|
||||
Recommended system prompt:
|
||||
|
||||
```text
|
||||
You are a Pokemon Showdown battle AI. You play as {side}. Given the battle log, output your next action. Format: move <name> OR switch <name>. Append terastallize if you terastallize this turn.
|
||||
```
|
||||
|
||||
## Quick start
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
repo_id = "GoldenGrapeGentleman1/pokemon-showdown-agent-v6"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
repo_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are a Pokemon Showdown battle AI. You play as p2. "
|
||||
"Given the battle log, output your next action. "
|
||||
"Format: move <name> OR switch <name>. "
|
||||
"Append terastallize if you terastallize this turn."
|
||||
),
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"|player|p1|Player1|266|1500\n"
|
||||
"|player|p2|Player2|1|1500\n"
|
||||
"|teamsize|p1|6\n"
|
||||
"|teamsize|p2|6\n"
|
||||
"|gen|9\n"
|
||||
"|tier|[Gen 9] OU\n"
|
||||
"|\n"
|
||||
"|start\n"
|
||||
"|switch|p1a: Garchomp|Garchomp, M|100/100\n"
|
||||
"|switch|p2a: Corviknight|Corviknight, M|100/100\n"
|
||||
"|turn|1\n"
|
||||
"|move|p1a: Garchomp|Earthquake|p2a: Corviknight\n"
|
||||
"|-immune|p2a: Corviknight\n"
|
||||
"|turn|2"
|
||||
),
|
||||
},
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=64,
|
||||
do_sample=False,
|
||||
temperature=0.1,
|
||||
pad_token_id=tokenizer.eos_token_id,
|
||||
)
|
||||
|
||||
decoded = tokenizer.decode(outputs[0], skip_special_tokens=False)
|
||||
response = decoded.split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip()
|
||||
if response.startswith("<think>"):
|
||||
response = response.split("</think>", 1)[-1].strip()
|
||||
print(response)
|
||||
```
|
||||
|
||||
## Training data
|
||||
|
||||
The full `v6` preprocessing pipeline was built from the public dataset:
|
||||
|
||||
- Source dataset: [`milkkarten/pokemon-showdown-replays-merged`](https://huggingface.co/datasets/milkkarten/pokemon-showdown-replays-merged)
|
||||
|
||||
Project preprocessing summary:
|
||||
|
||||
- `100,000` train games
|
||||
- `10,000` test games
|
||||
- `2,330,115` train samples
|
||||
- `236,349` test samples
|
||||
- `min_rating = 1200`
|
||||
- `max_chars = 12000`
|
||||
|
||||
The companion tutorial notebook uses a smaller streamed subset with a higher rating filter so readers can reproduce the workflow quickly without downloading the full corpus.
|
||||
|
||||
## Training recipe
|
||||
|
||||
- Base model: `Qwen/Qwen3-4B`
|
||||
- Fine-tuning method: LoRA SFT with Unsloth
|
||||
- LoRA rank / alpha: `64 / 128`
|
||||
- Full training context length: up to `4096`
|
||||
- Frameworks: Unsloth, TRL, Transformers, Datasets
|
||||
- Deployment recommendation on AMD: prefer `bfloat16` inference for stability
|
||||
|
||||
## Limitations
|
||||
|
||||
- This is a research checkpoint, not a complete battle engine.
|
||||
- The model can still produce illegal or strategically weak actions.
|
||||
- Prompt wording matters; changing the system format can reduce output reliability.
|
||||
- Included evaluation artifacts are sanity checks, not a full competitive benchmark.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
- [Unsloth](https://github.com/unslothai/unsloth)
|
||||
- [TRL](https://github.com/huggingface/trl)
|
||||
- [Qwen](https://huggingface.co/Qwen)
|
||||
- [Pokemon Showdown](https://pokemonshowdown.com/)
|
||||
|
||||
## Citation
|
||||
|
||||
If you build on this work, please cite the upstream tooling as well:
|
||||
|
||||
```bibtex
|
||||
@misc{vonwerra2022trl,
|
||||
title = {{TRL: Transformer Reinforcement Learning}},
|
||||
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
|
||||
year = 2020,
|
||||
journal = {GitHub repository},
|
||||
publisher = {GitHub},
|
||||
howpublished = {\url{https://github.com/huggingface/trl}}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user