初始化项目，由ModelHub XC社区提供模型

Model: GoldenGrapeGentleman1/pokemon-showdown-agent-v6 Source: Original Platform
2026-04-18 08:31:48 +08:00
commit 37b4d3a899
15 changed files with 153788 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,176 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen3-4B
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- unsloth
+- trl
+- sft
+- qwen3
+- pokemon-showdown
+- game-ai
+- rocm
+- amd
+language:
+- en
+---
+
+# Pokemon Showdown Agent v6
+
+`Pokemon Showdown Agent v6` is a `Qwen/Qwen3-4B` fine-tune for **next-action prediction from raw Pokemon Showdown replay logs**. Given a battle-log prefix and the side it controls, the model is trained to emit a short action command such as `move Earthquake` or `switch Corviknight`.
+
+This release is the merged checkpoint from the `v6` pipeline built with **Unsloth + TRL + AMD ROCm**. The tutorial version of the workflow uses a much smaller streamed subset for fast reproduction; this model is the larger production-oriented artifact.
+
+## What makes v6 different
+
+- It learns directly from messy raw replay logs instead of hand-written state summaries.
+- It targets a strict action format suitable for agent pipelines: `move [move-name]` or `switch [pokemon-name]`.
+- It was developed around AMD ROCm workflows, with `bfloat16` recommended for stable inference.
+
+## Official notebook
+
+Use the cleaned release notebook `pokemon_showdown_agent_v6_release.ipynb` for the reproducible tutorial flow.
+
+## Intended use
+
+Use this model when you want to:
+
+- Predict the next action from a raw Pokemon Showdown log prefix.
+- Build a text-only battle agent or evaluation harness.
+- Study agent alignment from real replay trajectories.
+
+This model is **not** a full simulator policy by itself. For ladder play or automated battle loops, you still need legality checks, environment wrappers, and battle-state management outside the model.
+
+## Prompt format
+
+The model expects a chat-style prompt with:
+
+- A `system` message specifying which side the model is playing as.
+- A `user` message containing the raw replay log prefix up to the current turn marker.
+
+Recommended system prompt:
+
+```text
+You are a Pokemon Showdown battle AI. You play as {side}. Given the battle log, output your next action. Format: move <name> OR switch <name>. Append terastallize if you terastallize this turn.
+```
+
+## Quick start
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+repo_id = "GoldenGrapeGentleman1/pokemon-showdown-agent-v6"
+
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model = AutoModelForCausalLM.from_pretrained(
+    repo_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+
+messages = [
+    {
+        "role": "system",
+        "content": (
+            "You are a Pokemon Showdown battle AI. You play as p2. "
+            "Given the battle log, output your next action. "
+            "Format: move <name> OR switch <name>. "
+            "Append terastallize if you terastallize this turn."
+        ),
+    },
+    {
+        "role": "user",
+        "content": (
+            "|player|p1|Player1|266|1500\n"
+            "|player|p2|Player2|1|1500\n"
+            "|teamsize|p1|6\n"
+            "|teamsize|p2|6\n"
+            "|gen|9\n"
+            "|tier|[Gen 9] OU\n"
+            "|\n"
+            "|start\n"
+            "|switch|p1a: Garchomp|Garchomp, M|100/100\n"
+            "|switch|p2a: Corviknight|Corviknight, M|100/100\n"
+            "|turn|1\n"
+            "|move|p1a: Garchomp|Earthquake|p2a: Corviknight\n"
+            "|-immune|p2a: Corviknight\n"
+            "|turn|2"
+        ),
+    },
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=64,
+        do_sample=False,
+        temperature=0.1,
+        pad_token_id=tokenizer.eos_token_id,
+    )
+
+decoded = tokenizer.decode(outputs[0], skip_special_tokens=False)
+response = decoded.split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip()
+if response.startswith("<think>"):
+    response = response.split("</think>", 1)[-1].strip()
+print(response)
+```
+
+## Training data
+
+The full `v6` preprocessing pipeline was built from the public dataset:
+
+- Source dataset: [`milkkarten/pokemon-showdown-replays-merged`](https://huggingface.co/datasets/milkkarten/pokemon-showdown-replays-merged)
+
+Project preprocessing summary:
+
+- `100,000` train games
+- `10,000` test games
+- `2,330,115` train samples
+- `236,349` test samples
+- `min_rating = 1200`
+- `max_chars = 12000`
+
+The companion tutorial notebook uses a smaller streamed subset with a higher rating filter so readers can reproduce the workflow quickly without downloading the full corpus.
+
+## Training recipe
+
+- Base model: `Qwen/Qwen3-4B`
+- Fine-tuning method: LoRA SFT with Unsloth
+- LoRA rank / alpha: `64 / 128`
+- Full training context length: up to `4096`
+- Frameworks: Unsloth, TRL, Transformers, Datasets
+- Deployment recommendation on AMD: prefer `bfloat16` inference for stability
+
+## Limitations
+
+- This is a research checkpoint, not a complete battle engine.
+- The model can still produce illegal or strategically weak actions.
+- Prompt wording matters; changing the system format can reduce output reliability.
+- Included evaluation artifacts are sanity checks, not a full competitive benchmark.
+
+## Acknowledgements
+
+- [Unsloth](https://github.com/unslothai/unsloth)
+- [TRL](https://github.com/huggingface/trl)
+- [Qwen](https://huggingface.co/Qwen)
+- [Pokemon Showdown](https://pokemonshowdown.com/)
+
+## Citation
+
+If you build on this work, please cite the upstream tooling as well:
+
+```bibtex
+@misc{vonwerra2022trl,
+  title        = {{TRL: Transformer Reinforcement Learning}},
+  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
+  year         = 2020,
+  journal      = {GitHub repository},
+  publisher    = {GitHub},
+  howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```