30 lines
833 B
Markdown
30 lines
833 B
Markdown
---
|
|
library_name: transformers
|
|
base_model: Qwen/Qwen3-8B
|
|
tags:
|
|
- generated_from_trainer
|
|
- axolotl
|
|
- sera
|
|
- sft
|
|
- qwen3
|
|
license: apache-2.0
|
|
---
|
|
|
|
# sera-subset-mixed-316-axolotl__Qwen3-8B-v8
|
|
|
|
SFT of Qwen/Qwen3-8B on a 316-row random mixed subset of `ethanlshen/sera-subset` (stage1 unresolved + stage2 resolved), trained with axolotl following the upstream SERA recipe.
|
|
|
|
See `baselines/sera/README.md` in the open-thoughts/OpenThoughts-Agent repo for the full reproduction details, hyperparameters, and iteration history (this is iteration i9, version v8).
|
|
|
|
## Hyperparameters
|
|
|
|
- learning_rate: 1e-5
|
|
- batch_size: 32 (global; micro=1, grad_accum=1, dp=32)
|
|
- num_epochs: 3
|
|
- warmup_steps: 48
|
|
- adam_beta1: 0.9, adam_beta2: 0.95
|
|
- weight_decay: 0.01
|
|
- sequence_len: 32768
|
|
- chat_template: chatml
|
|
- bf16, deepspeed zero3 (no CPU offload)
|