初始化项目,由ModelHub XC社区提供模型

Model: glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-04 12:28:31 +08:00
commit 25a5cefae5
13 changed files with 151950 additions and 0 deletions

57
README.md Normal file
View File

@@ -0,0 +1,57 @@
---
base_model: Qwen/Qwen3-0.6B
library_name: transformers
license: apache-2.0
language:
- en
- fr
tags:
- granite
- fine-tuned
- conversational
- distillation
- thinking
- reasoning
datasets:
- TeichAI/glm-4.7-2000x
pipeline_tag: text-generation
---
# Qwen3-0.6B-DISTILL-glm-4.7-think
This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) trained on high-reasoning conversational data from GLM 4.7 by Z.ai.
## Model Details
- **Base Model:** Qwen/Qwen3-0.6B
- **Fine-tuning Dataset:** TeichAI/glm-4.7-2000x
- **Context Length:** 1048576 tokens
- **Special Feature:** Thinking/Reasoning with `<think>` tags
## Usage
### Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
- **Epochs:** 2
- **Learning Rate:** 2e-5
- **Batch Size:** 8 (with gradient accumulation)
- **Precision:** FP16
- **Hardware:** Multi-GPU with DeepSpeed ZeRO-3
## License
Apache 2.0