Qwen3-0.6B-DISTILL-glm-4.7-…/README.md

---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
license: apache-2.0
language:
- en
- fr
tags:
- granite
- fine-tuned
- conversational
- distillation
- thinking
- reasoning
datasets:
- TeichAI/glm-4.7-2000x
pipeline_tag: text-generation
---

# Qwen3-0.6B-DISTILL-glm-4.7-think

This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) trained on high-reasoning conversational data from GLM 4.7 by Z.ai.

## Model Details

- **Base Model:** Qwen/Qwen3-0.6B
- **Fine-tuning Dataset:** TeichAI/glm-4.7-2000x
- **Context Length:** 1048576 tokens
- **Special Feature:** Thinking/Reasoning with `<think>` tags

## Usage

### Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Details

- **Epochs:** 2
- **Learning Rate:** 2e-5
- **Batch Size:** 8 (with gradient accumulation)
- **Precision:** FP16
- **Hardware:** Multi-GPU with DeepSpeed ZeRO-3

## License

Apache 2.0
初始化项目，由ModelHub XC社区提供模型 Model: glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think Source: Original Platform 2026-06-04 12:28:31 +08:00			`---`
			`base_model: Qwen/Qwen3-0.6B`
			`library_name: transformers`
			`license: apache-2.0`
			`language:`
			`- en`
			`- fr`
			`tags:`
			`- granite`
			`- fine-tuned`
			`- conversational`
			`- distillation`
			`- thinking`
			`- reasoning`
			`datasets:`
			`- TeichAI/glm-4.7-2000x`
			`pipeline_tag: text-generation`
			`---`

			`# Qwen3-0.6B-DISTILL-glm-4.7-think`

			`This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) trained on high-reasoning conversational data from GLM 4.7 by Z.ai.`

			`## Model Details`

			`- Base Model: Qwen/Qwen3-0.6B`
			`- Fine-tuning Dataset: TeichAI/glm-4.7-2000x`
			`- Context Length: 1048576 tokens`
			- Special Feature: Thinking/Reasoning with `<think>` tags

			`## Usage`

			`### Transformers`

			```python
			`from transformers import AutoModelForCausalLM, AutoTokenizer`

			`model = AutoModelForCausalLM.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")`
			`tokenizer = AutoTokenizer.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")`

			`messages = [{"role": "user", "content": "Hello, how are you?"}]`
			`inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)`
			`outputs = model.generate(inputs, max_new_tokens=256)`
			`print(tokenizer.decode(outputs[0], skip_special_tokens=True))`
			```

			`## Training Details`

			`- Epochs: 2`
			`- Learning Rate: 2e-5`
			`- Batch Size: 8 (with gradient accumulation)`
			`- Precision: FP16`
			`- Hardware: Multi-GPU with DeepSpeed ZeRO-3`

			`## License`

			`Apache 2.0`