Model: syj4205/broken-model-fixed Source: Original Platform
library_name, pipeline_tag, base_model
| library_name | pipeline_tag | base_model | |
|---|---|---|---|
| transformers | text-generation |
|
Qwen3-8B Fixed Model
This repository is a fixed version of yunmorning/broken-model.
The original model could not be used to run a functional /chat/completions API server due to two critical issues.
Changes Made
Fix 1: Added chat_template to tokenizer_config.json
Before: chat_template field did not exist
After: Added official Qwen3 Jinja2 chat template
Why: OpenAI-compatible API servers (vLLM, FriendliAI, etc.) rely on chat_template to convert the messages array into model input via tokenizer.apply_chat_template(). Without this field, the /chat/completions endpoint cannot format prompts and fails entirely.
Fix 2: Corrected shard mapping in model.safetensors.index.json
Before: Layer 7's q_proj, k_proj, v_proj pointed to wrong shard
After: Corrected to the right shard
| Tensor | Before | After |
|---|---|---|
model.layers.7.self_attn.q_proj.weight |
model-00001-of-00005.safetensors |
model-00002-of-00005.safetensors |
model.layers.7.self_attn.k_proj.weight |
model-00001-of-00005.safetensors |
model-00002-of-00005.safetensors |
model.layers.7.self_attn.v_proj.weight |
model-00001-of-00005.safetensors |
model-00002-of-00005.safetensors |
Why: All other tensors in Layer 7 correctly point to model-00002. This mismatch causes a weight loading error at inference time since the tensors cannot be found in the referenced shard.
Fix 3: Corrected base_model metadata in README.md
Before: base_model: meta-llama/Meta-Llama-3.1-8B
After: base_model: Qwen/Qwen3-8B
Why: The actual weights (lm_head.weight shape [151936, 4096]) and config (model_type: qwen3, architectures: Qwen3ForCausalLM) confirm this is a Qwen3-8B model. LLaMA 3.1-8B has a vocab size of 128,256, which does not match. This is a metadata-only fix with no effect on inference.
What Was NOT Changed
| Item | Reason |
|---|---|
config.json |
All architecture values match Qwen3-8B spec exactly |
tokenizer_class: "Qwen2Tokenizer" |
Qwen3 intentionally reuses Qwen2Tokenizer (same BPE) |
eos_token_id: [151645, 151643] |
Matches official Qwen3 generation config |
Verification
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="syj4205/broken-model-fixed",
dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Who are you?"}]
result = pipe(messages, max_new_tokens=40)
print(result[0]["generated_text"][-1]["content"])
Output:
<think> Okay, the user asked, "Who are you?" I need to respond in a friendly and
informative way. Let me start by introducing my name, Qwen...</think>
I'm Qwen, a large language model developed by Alibaba Cloud.