--- base_model: Qwen/Qwen3-8B library_name: transformers pipeline_tag: text-generation --- # broken-model-fixed This repository is a fixed copy of [yunmorning/broken-model](https://huggingface.co/yunmorning/broken-model), which was reported as unable to serve a functional `/chat/completions` API endpoint. ## Root Cause The model weights, architecture config, and tokenizer vocabulary are all valid Qwen3-8B artifacts. However, the `tokenizer_config.json` was **missing the `chat_template` field**. The `chat_template` is a Jinja2 template that tells inference servers (vLLM, TGI, FriendliAI, etc.) how to convert a list of chat messages (`[{"role": "user", "content": "Hello"}]`) into the model's expected prompt format (ChatML with `<|im_start|>`/`<|im_end|>` delimiters). Without it, any `/chat/completions` request fails with: ``` ValueError: Cannot use chat template functions because tokenizer.chat_template is not set ``` ## Changes Made ### 1. `tokenizer_config.json` — added `chat_template` (critical fix) Added the standard Qwen3-8B chat template (4168 characters) from the reference model [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). This was the **only** field missing; all other tokenizer config values were already correct and identical to the reference. The template handles: - System, user, and assistant message formatting using ChatML (`<|im_start|>`/`<|im_end|>`) - Tool/function calling via `` and `` tags - Reasoning/thinking blocks via `` tags - Generation prompt injection for inference ### 2. `README.md` — corrected `base_model` metadata The original README declared `base_model: meta-llama/Llama-3.1-8B`, but the model is actually Qwen3-8B based on: - `config.json`: `architectures: ["Qwen3ForCausalLM"]`, `model_type: "qwen3"` - Weight tensors contain Qwen3-specific layers (`q_norm`, `k_norm`) - `tokenizer_config.json`: `tokenizer_class: "Qwen2Tokenizer"`, vocab_size 151936 - 36 hidden layers and intermediate_size 12288 (matching Qwen3-8B, not Llama-3.1-8B which has 32 layers and intermediate_size 14336) ## Verification on FriendliAI When importing the **broken** model (`yunmorning/broken-model`) into FriendliAI, both "Tool call" and "Reasoning parser" features were detected as **"Not supported"**, and endpoint creation failed with an internal system error. After applying the fix, importing `majimenez/broken-model-fixed` into FriendliAI correctly detects both features as **"Supported"**. This is because FriendliAI parses the `chat_template` to detect these capabilities: - **Tool call**: the template contains ``/`` and ``/`` formatting logic - **Reasoning parser**: the template contains ``/`` block handling with `enable_thinking` support Without a `chat_template` at all, neither feature can be detected or used. ### End-to-end confirmation - **Broken model** (`yunmorning/broken-model`): Failed to load entirely on FriendliAI — endpoint creation returned an internal system error. - **Fixed model** (`majimenez/broken-model-fixed`): Deployed successfully on FriendliAI and responded correctly to `/chat/completions` requests, producing both reasoning (`` blocks) and body response text. ### Files NOT changed All other files were verified correct and left unmodified: - `config.json` — identical to reference Qwen/Qwen3-8B - `generation_config.json` — identical to reference Qwen/Qwen3-8B - `tokenizer.json`, `vocab.json`, `merges.txt` — tokenizer vocabulary files - `model-*.safetensors` / `model.safetensors.index.json` — model weights