--- license: mit language: - en library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-4B tags: - qwen3 - ssd - self-distillation - rlve --- # Qwen3-4B SSD (RLVE Eval20, N=20) — global step 500 Weights merged from VERL FSDP SFT checkpoint **`global_step_500`** (500 optimizer steps, 1 epoch schedule) of **Simple Self-Distillation (SSD)** applied to **Qwen/Qwen3-4B**: sample N=20 self-generated responses from the frozen base model, then SFT on those samples. ## Training data Parquet SFT corpus (16k rows, `messages` column): [CL-From-Nothing/RLVE-Eval20-Qwen3-4B-SSD-N20-SFT-Train](https://huggingface.co/datasets/CL-From-Nothing/RLVE-Eval20-Qwen3-4B-SSD-N20-SFT-Train). Companion 1.7B model: [CL-From-Nothing/Qwen3-1-7B-SSD-RLVE-Eval20-N20-global-step-500](https://huggingface.co/CL-From-Nothing/Qwen3-1-7B-SSD-RLVE-Eval20-N20-global-step-500). ## Load ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "CL-From-Nothing/Qwen3-4B-SSD-RLVE-Eval20-N20-global-step-500" tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") ```