ModelHub XC 843d409127 初始化项目,由ModelHub XC社区提供模型
Model: parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x
Source: Original Platform
2026-05-05 12:12:40 +08:00

base_model, library_name, pipeline_tag, tags
base_model library_name pipeline_tag tags
Qwen/Qwen3-8B transformers text-generation
qwen3
threadweaver
trl
sft
long-context
flex-attention

threadweaver-qwen3-8b-131072-sft8x

This repository contains the final exported model from the task_sft8x run in expts/02_tw_repro.

Source checkpoint

  • Local final checkpoint: task_sft8x/output/checkpoint/checkpoint-482
  • Final checkpoint timestamp: 2026-04-12 04:59 UTC
  • Base initialization: local Qwen3-8B-131072 variant derived from Qwen/Qwen3-8B
  • Architecture: Qwen3ForCausalLM
  • Max position embeddings: 131072
  • Attention implementation used during training: flex_attention

Files

This upload contains the inference-ready exported model weights, tokenizer files, and generation config from the final checkpoint. Optimizer and RNG state files are intentionally not included.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
Description
Model synced from source: parallel-reasoner/threadweaver-qwen3-8b-131072-sft8x
Readme 2 MiB