Model: GoldenGrapeGentleman1/pokemon-showdown-agent-v6 Source: Original Platform
license, base_model, library_name, pipeline_tag, tags, language
| license | base_model | library_name | pipeline_tag | tags | language | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 | Qwen/Qwen3-4B | transformers | text-generation |
|
|
Pokemon Showdown Agent v6
Pokemon Showdown Agent v6 is a Qwen/Qwen3-4B fine-tune for next-action prediction from raw Pokemon Showdown replay logs. Given a battle-log prefix and the side it controls, the model is trained to emit a short action command such as move Earthquake or switch Corviknight.
This release is the merged checkpoint from the v6 pipeline built with Unsloth + TRL + AMD ROCm. The tutorial version of the workflow uses a much smaller streamed subset for fast reproduction; this model is the larger production-oriented artifact.
What makes v6 different
- It learns directly from messy raw replay logs instead of hand-written state summaries.
- It targets a strict action format suitable for agent pipelines:
move [move-name]orswitch [pokemon-name]. - It was developed around AMD ROCm workflows, with
bfloat16recommended for stable inference.
Official notebook
Use the cleaned release notebook pokemon_showdown_agent_v6_release.ipynb for the reproducible tutorial flow.
Intended use
Use this model when you want to:
- Predict the next action from a raw Pokemon Showdown log prefix.
- Build a text-only battle agent or evaluation harness.
- Study agent alignment from real replay trajectories.
This model is not a full simulator policy by itself. For ladder play or automated battle loops, you still need legality checks, environment wrappers, and battle-state management outside the model.
Prompt format
The model expects a chat-style prompt with:
- A
systemmessage specifying which side the model is playing as. - A
usermessage containing the raw replay log prefix up to the current turn marker.
Recommended system prompt:
You are a Pokemon Showdown battle AI. You play as {side}. Given the battle log, output your next action. Format: move <name> OR switch <name>. Append terastallize if you terastallize this turn.
Quick start
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "GoldenGrapeGentleman1/pokemon-showdown-agent-v6"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "system",
"content": (
"You are a Pokemon Showdown battle AI. You play as p2. "
"Given the battle log, output your next action. "
"Format: move <name> OR switch <name>. "
"Append terastallize if you terastallize this turn."
),
},
{
"role": "user",
"content": (
"|player|p1|Player1|266|1500\n"
"|player|p2|Player2|1|1500\n"
"|teamsize|p1|6\n"
"|teamsize|p2|6\n"
"|gen|9\n"
"|tier|[Gen 9] OU\n"
"|\n"
"|start\n"
"|switch|p1a: Garchomp|Garchomp, M|100/100\n"
"|switch|p2a: Corviknight|Corviknight, M|100/100\n"
"|turn|1\n"
"|move|p1a: Garchomp|Earthquake|p2a: Corviknight\n"
"|-immune|p2a: Corviknight\n"
"|turn|2"
),
},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
temperature=0.1,
pad_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = decoded.split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip()
if response.startswith("<think>"):
response = response.split("</think>", 1)[-1].strip()
print(response)
Training data
The full v6 preprocessing pipeline was built from the public dataset:
- Source dataset:
milkkarten/pokemon-showdown-replays-merged
Project preprocessing summary:
100,000train games10,000test games2,330,115train samples236,349test samplesmin_rating = 1200max_chars = 12000
The companion tutorial notebook uses a smaller streamed subset with a higher rating filter so readers can reproduce the workflow quickly without downloading the full corpus.
Training recipe
- Base model:
Qwen/Qwen3-4B - Fine-tuning method: LoRA SFT with Unsloth
- LoRA rank / alpha:
64 / 128 - Full training context length: up to
4096 - Frameworks: Unsloth, TRL, Transformers, Datasets
- Deployment recommendation on AMD: prefer
bfloat16inference for stability
Limitations
- This is a research checkpoint, not a complete battle engine.
- The model can still produce illegal or strategically weak actions.
- Prompt wording matters; changing the system format can reduce output reliability.
- Included evaluation artifacts are sanity checks, not a full competitive benchmark.
Acknowledgements
Citation
If you build on this work, please cite the upstream tooling as well:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}