103 lines
3.1 KiB
Markdown
103 lines
3.1 KiB
Markdown
|
|
---
|
|||
|
|
base_model:
|
|||
|
|
- unsloth/Qwen3-4B-Thinking-2507
|
|||
|
|
tags:
|
|||
|
|
- text-generation-inference
|
|||
|
|
- transformers
|
|||
|
|
- unsloth
|
|||
|
|
- qwen3
|
|||
|
|
license: apache-2.0
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
datasets:
|
|||
|
|
- khazarai/kimi-2.5-high-reasoning-250x
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
# Qwen3-4B-Kimi2.5-Reasoning-Distilled
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
|
|||
|
|
| Model | Score |
|
|||
|
|
| :--- | :--- |
|
|||
|
|
| khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled | 76.09 |
|
|||
|
|
| Qwen/Qwen3-4B-Thinking-2507 | 73.73 |
|
|||
|
|
|
|||
|
|
|
|||
|
|
- **Benchmark**: khazarai/Multi-Domain-Reasoning-Benchmark
|
|||
|
|
- **Total Questions**: 100
|
|||
|
|
|
|||
|
|
`Qwen3-4B-Kimi2.5-Reasoning-Distilled` is a fine-tuned language model optimized for structured, long-form reasoning. It is derived from the Qwen3-4b-Thinking-2507 base model and fine-tuned using a specialized distillation dataset generated by Kimi-2.5-thinking.
|
|||
|
|
|
|||
|
|
This model is designed to bridge the gap between small, efficient models (0.6B–4B range) and the complex reasoning capabilities typically found in much larger models. It excels at breaking down problems, self-correcting, and providing detailed analytical answers.
|
|||
|
|
|
|||
|
|
**Base Model**: Qwen3-4b-Thinking-2507
|
|||
|
|
|
|||
|
|
**Training Technique**: Unsloth + QLoRa
|
|||
|
|
|
|||
|
|
## Dataset
|
|||
|
|
|
|||
|
|
The model was fine-tuned on the [khazarai/kimi-2.5-high-reasoning-250x](https://huggingface.co/datasets/khazarai/kimi-2.5-high-reasoning-250x)
|
|||
|
|
|
|||
|
|
Dataset Composition:
|
|||
|
|
- Total Samples: 250
|
|||
|
|
- Total Tokens: 1,114,407
|
|||
|
|
- Teacher Model: Kimi-2.5-Thinking
|
|||
|
|
|
|||
|
|
|
|||
|
|
## How to Get Started with the Model
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled")
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
"khazarai/Qwen3-4B-Kimi2.5-Reasoning-Distilled",
|
|||
|
|
device_map={"": 0}
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
question = """
|
|||
|
|
You are the Head of Strategy at a mid-sized AI startup. Your company has two options for the next 2 years:
|
|||
|
|
- Option A: Invest heavily in fine-tuning open-source LLMs for niche domains (e.g., healthcare, legal).
|
|||
|
|
- Option B: Build a proprietary foundation model from scratch.
|
|||
|
|
|
|||
|
|
You have the following constraints:
|
|||
|
|
- Budget: $20M total
|
|||
|
|
- Team: 25 engineers (strong in ML, limited infra experience)
|
|||
|
|
- Time horizon: 24 months
|
|||
|
|
- Market: Highly competitive, dominated by large players (OpenAI, Google, Anthropic)
|
|||
|
|
|
|||
|
|
Tasks:
|
|||
|
|
|
|||
|
|
1. Construct a decision framework (e.g., expected value, risk-adjusted return, or strategic positioning).
|
|||
|
|
2. Identify key uncertainties and how you would model them.
|
|||
|
|
3. Recommend one option and justify it rigorously.
|
|||
|
|
4. Describe a contingency plan if your chosen strategy fails within 12 months.
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{"role" : "user", "content" : question}
|
|||
|
|
]
|
|||
|
|
text = tokenizer.apply_chat_template(
|
|||
|
|
messages,
|
|||
|
|
tokenize = False,
|
|||
|
|
add_generation_prompt = True,
|
|||
|
|
enable_thinking = True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
from transformers import TextStreamer
|
|||
|
|
_ = model.generate(
|
|||
|
|
**tokenizer(text, return_tensors = "pt").to("cuda"),
|
|||
|
|
max_new_tokens = 2048,
|
|||
|
|
temperature = 0.6,
|
|||
|
|
top_p = 0.95,
|
|||
|
|
top_k = 20,
|
|||
|
|
streamer = TextStreamer(tokenizer, skip_prompt = True),
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
## Acknowledgements
|
|||
|
|
|
|||
|
|
**Unsloth** for the incredibly fast and memory-efficient training framework.
|