843d409127804102f125ec92e4be44050e3aec2e
Model: parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x Source: Original Platform
base_model, library_name, pipeline_tag, tags
| base_model | library_name | pipeline_tag | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen3-8B | transformers | text-generation |
|
threadweaver-qwen3-8b-131072-sft8x
This repository contains the final exported model from the task_sft8x run in expts/02_tw_repro.
Source checkpoint
- Local final checkpoint:
task_sft8x/output/checkpoint/checkpoint-482 - Final checkpoint timestamp: 2026-04-12 04:59 UTC
- Base initialization: local
Qwen3-8B-131072variant derived fromQwen/Qwen3-8B - Architecture:
Qwen3ForCausalLM - Max position embeddings:
131072 - Attention implementation used during training:
flex_attention
Files
This upload contains the inference-ready exported model weights, tokenizer files, and generation config from the final checkpoint. Optimizer and RNG state files are intentionally not included.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
Description