Files

ModelHub XC df2b17584a 初始化项目，由ModelHub XC社区提供模型

Model: miaolu3/qwen3-8b-alfworld-rl-step570
Source: Original Platform

2026-05-31 17:46:12 +08:00

1.1 KiB

Raw Blame History

library_name, license, base_model, pipeline_tag, tags

library_name

license

base_model

pipeline_tag

Qwen3-8B ALFWorld RL (step 570)

Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world benchmark. This is a snapshot at training step 570.

Base model: Qwen/Qwen3-8B
Task: ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen)
Training step: 570
Validation success rate (valid_seen, T=0.4, 8 rollouts/game): ~0.957
Inference: Qwen3 chat template with thinking enabled (enable_thinking=True). The model reasons inside <think>...</think> and emits the chosen action inside <action>...</action> tags, conditioned on the current observation and admissible-action list.
Tokenizer: identical to Qwen/Qwen3-8B (vocab 151643).

Intended use

Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and as a strong baseline for ALFWorld policy work. Released alongside the training recipe at https://github.com/MiaoLu3/verl.

1.1 KiB Raw Blame History

Qwen3-8B ALFWorld RL (step 570)

Intended use

1.1 KiB

Raw Blame History