GoldenGrapeGentleman1/pokemon-showdown-agent-v6

Go to file

ModelHub XC 37b4d3a899 初始化项目，由ModelHub XC社区提供模型

Model: GoldenGrapeGentleman1/pokemon-showdown-agent-v6
Source: Original Platform

2026-04-18 08:31:48 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

pokemon_showdown_agent_v6_release.ipynb

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

pokemon_showdown_agent_v6_tutorial.ipynb

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-18 08:31:48 +08:00

README.md

license, base_model, library_name, pipeline_tag, tags, language

license

base_model

library_name

pipeline_tag

Pokemon Showdown Agent v6

Pokemon Showdown Agent v6 is a Qwen/Qwen3-4B fine-tune for next-action prediction from raw Pokemon Showdown replay logs. Given a battle-log prefix and the side it controls, the model is trained to emit a short action command such as move Earthquake or switch Corviknight.

This release is the merged checkpoint from the v6 pipeline built with Unsloth + TRL + AMD ROCm. The tutorial version of the workflow uses a much smaller streamed subset for fast reproduction; this model is the larger production-oriented artifact.

What makes v6 different

It learns directly from messy raw replay logs instead of hand-written state summaries.
It targets a strict action format suitable for agent pipelines: move [move-name] or switch [pokemon-name].
It was developed around AMD ROCm workflows, with bfloat16 recommended for stable inference.

Official notebook

Use the cleaned release notebook pokemon_showdown_agent_v6_release.ipynb for the reproducible tutorial flow.

Intended use

Use this model when you want to:

Predict the next action from a raw Pokemon Showdown log prefix.
Build a text-only battle agent or evaluation harness.
Study agent alignment from real replay trajectories.

This model is not a full simulator policy by itself. For ladder play or automated battle loops, you still need legality checks, environment wrappers, and battle-state management outside the model.

Prompt format

The model expects a chat-style prompt with:

A system message specifying which side the model is playing as.
A user message containing the raw replay log prefix up to the current turn marker.

Recommended system prompt:

You are a Pokemon Showdown battle AI. You play as {side}. Given the battle log, output your next action. Format: move <name> OR switch <name>. Append terastallize if you terastallize this turn.

Quick start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "GoldenGrapeGentleman1/pokemon-showdown-agent-v6"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "You are a Pokemon Showdown battle AI. You play as p2. "
            "Given the battle log, output your next action. "
            "Format: move <name> OR switch <name>. "
            "Append terastallize if you terastallize this turn."
        ),
    },
    {
        "role": "user",
        "content": (
            "|player|p1|Player1|266|1500\n"
            "|player|p2|Player2|1|1500\n"
            "|teamsize|p1|6\n"
            "|teamsize|p2|6\n"
            "|gen|9\n"
            "|tier|[Gen 9] OU\n"
            "|\n"
            "|start\n"
            "|switch|p1a: Garchomp|Garchomp, M|100/100\n"
            "|switch|p2a: Corviknight|Corviknight, M|100/100\n"
            "|turn|1\n"
            "|move|p1a: Garchomp|Earthquake|p2a: Corviknight\n"
            "|-immune|p2a: Corviknight\n"
            "|turn|2"
        ),
    },
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=False,
        temperature=0.1,
        pad_token_id=tokenizer.eos_token_id,
    )

decoded = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = decoded.split("<|im_start|>assistant\n")[-1].replace("<|im_end|>", "").strip()
if response.startswith("<think>"):
    response = response.split("</think>", 1)[-1].strip()
print(response)

Training data

The full v6 preprocessing pipeline was built from the public dataset:

Source dataset: milkkarten/pokemon-showdown-replays-merged

Project preprocessing summary:

100,000 train games
10,000 test games
2,330,115 train samples
236,349 test samples
min_rating = 1200
max_chars = 12000

The companion tutorial notebook uses a smaller streamed subset with a higher rating filter so readers can reproduce the workflow quickly without downloading the full corpus.

Training recipe

Base model: Qwen/Qwen3-4B
Fine-tuning method: LoRA SFT with Unsloth
LoRA rank / alpha: 64 / 128
Full training context length: up to 4096
Frameworks: Unsloth, TRL, Transformers, Datasets
Deployment recommendation on AMD: prefer bfloat16 inference for stability

Limitations

This is a research checkpoint, not a complete battle engine.
The model can still produce illegal or strategically weak actions.
Prompt wording matters; changing the system format can reduce output reliability.
Included evaluation artifacts are sanity checks, not a full competitive benchmark.

Acknowledgements

Citation

If you build on this work, please cite the upstream tooling as well:

@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}