初始化项目，由ModelHub XC社区提供模型

Model: parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x Source: Original Platform
2026-05-05 12:12:40 +08:00
commit 843d409127
16 changed files with 152402 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,39 @@
+---
+base_model: Qwen/Qwen3-8B
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- qwen3
+- threadweaver
+- trl
+- sft
+- long-context
+- flex-attention
+---
+
+# threadweaver-qwen3-8b-131072-sft8x
+
+This repository contains the final exported model from the `task_sft8x` run in `expts/02_tw_repro`.
+
+## Source checkpoint
+
+- Local final checkpoint: `task_sft8x/output/checkpoint/checkpoint-482`
+- Final checkpoint timestamp: 2026-04-12 04:59 UTC
+- Base initialization: local `Qwen3-8B-131072` variant derived from `Qwen/Qwen3-8B`
+- Architecture: `Qwen3ForCausalLM`
+- Max position embeddings: `131072`
+- Attention implementation used during training: `flex_attention`
+
+## Files
+
+This upload contains the inference-ready exported model weights, tokenizer files, and generation config from the final checkpoint. Optimizer and RNG state files are intentionally not included.
+
+## Quick start
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_id = "parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+```