---
base_model: Qwen/Qwen3-8B
library_name: transformers
pipeline_tag: text-generation
---

# broken-model-fixed

This repository is a fixed copy of [yunmorning/broken-model](https://huggingface.co/yunmorning/broken-model), which was reported as unable to serve a functional `/chat/completions` API endpoint.

## Root Cause

The model weights, architecture config, and tokenizer vocabulary are all valid Qwen3-8B artifacts. However, the `tokenizer_config.json` was **missing the `chat_template` field**.

The `chat_template` is a Jinja2 template that tells inference servers (vLLM, TGI, FriendliAI, etc.) how to convert a list of chat messages (`[{"role": "user", "content": "Hello"}]`) into the model's expected prompt format (ChatML with `<|im_start|>`/`<|im_end|>` delimiters). Without it, any `/chat/completions` request fails with:

```
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set
```

## Changes Made

### 1. `tokenizer_config.json` — added `chat_template` (critical fix)

Added the standard Qwen3-8B chat template (4168 characters) from the reference model [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). This was the **only** field missing; all other tokenizer config values were already correct and identical to the reference.

The template handles:
- System, user, and assistant message formatting using ChatML (`<|im_start|>`/`<|im_end|>`)
- Tool/function calling via `<tool_call>` and `<tool_response>` tags
- Reasoning/thinking blocks via `<think>` tags
- Generation prompt injection for inference

### 2. `README.md` — corrected `base_model` metadata

The original README declared `base_model: meta-llama/Llama-3.1-8B`, but the model is actually Qwen3-8B based on:
- `config.json`: `architectures: ["Qwen3ForCausalLM"]`, `model_type: "qwen3"`
- Weight tensors contain Qwen3-specific layers (`q_norm`, `k_norm`)
- `tokenizer_config.json`: `tokenizer_class: "Qwen2Tokenizer"`, vocab_size 151936
- 36 hidden layers and intermediate_size 12288 (matching Qwen3-8B, not Llama-3.1-8B which has 32 layers and intermediate_size 14336)

## Verification on FriendliAI

When importing the **broken** model (`yunmorning/broken-model`) into FriendliAI, both "Tool call" and "Reasoning parser" features were detected as **"Not supported"**, and endpoint creation failed with an internal system error.

After applying the fix, importing `majimenez/broken-model-fixed` into FriendliAI correctly detects both features as **"Supported"**. This is because FriendliAI parses the `chat_template` to detect these capabilities:
- **Tool call**: the template contains `<tool_call>`/`</tool_call>` and `<tool_response>`/`</tool_response>` formatting logic
- **Reasoning parser**: the template contains `<think>`/`</think>` block handling with `enable_thinking` support

Without a `chat_template` at all, neither feature can be detected or used.

### End-to-end confirmation

- **Broken model** (`yunmorning/broken-model`): Failed to load entirely on FriendliAI — endpoint creation returned an internal system error.
- **Fixed model** (`majimenez/broken-model-fixed`): Deployed successfully on FriendliAI and responded correctly to `/chat/completions` requests, producing both reasoning (`<think>` blocks) and body response text.

### Files NOT changed

All other files were verified correct and left unmodified:
- `config.json` — identical to reference Qwen/Qwen3-8B
- `generation_config.json` — identical to reference Qwen/Qwen3-8B
- `tokenizer.json`, `vocab.json`, `merges.txt` — tokenizer vocabulary files
- `model-*.safetensors` / `model.safetensors.index.json` — model weights