初始化项目，由ModelHub XC社区提供模型

Model: laion/sera-subset-mixed-316-axolotl__Qwen3-8B-v8 Source: Original Platform
2026-05-07 09:46:50 +08:00
commit 910a80b86a
8 changed files with 396 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,29 @@
+---
+library_name: transformers
+base_model: Qwen/Qwen3-8B
+tags:
+- generated_from_trainer
+- axolotl
+- sera
+- sft
+- qwen3
+license: apache-2.0
+---
+
+# sera-subset-mixed-316-axolotl__Qwen3-8B-v8
+
+SFT of Qwen/Qwen3-8B on a 316-row random mixed subset of `ethanlshen/sera-subset` (stage1 unresolved + stage2 resolved), trained with axolotl following the upstream SERA recipe.
+
+See `baselines/sera/README.md` in the open-thoughts/OpenThoughts-Agent repo for the full reproduction details, hyperparameters, and iteration history (this is iteration i9, version v8).
+
+## Hyperparameters
+
+- learning_rate: 1e-5
+- batch_size: 32 (global; micro=1, grad_accum=1, dp=32)
+- num_epochs: 3
+- warmup_steps: 48
+- adam_beta1: 0.9, adam_beta2: 0.95
+- weight_decay: 0.01
+- sequence_len: 32768
+- chat_template: chatml
+- bf16, deepspeed zero3 (no CPU offload)