Files
ModelHub XC 25a5cefae5 初始化项目,由ModelHub XC社区提供模型
Model: glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think
Source: Original Platform
2026-06-04 12:28:31 +08:00

1.4 KiB

base_model, library_name, license, language, tags, datasets, pipeline_tag
base_model library_name license language tags datasets pipeline_tag
Qwen/Qwen3-0.6B transformers apache-2.0
en
fr
granite
fine-tuned
conversational
distillation
thinking
reasoning
TeichAI/glm-4.7-2000x
text-generation

Qwen3-0.6B-DISTILL-glm-4.7-think

This model is a fine-tuned version of Qwen/Qwen3-0.6B trained on high-reasoning conversational data from GLM 4.7 by Z.ai.

Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Fine-tuning Dataset: TeichAI/glm-4.7-2000x
  • Context Length: 1048576 tokens
  • Special Feature: Thinking/Reasoning with <think> tags

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/Qwen3-0.6B-DISTILL-glm-4.7-think")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

  • Epochs: 2
  • Learning Rate: 2e-5
  • Batch Size: 8 (with gradient accumulation)
  • Precision: FP16
  • Hardware: Multi-GPU with DeepSpeed ZeRO-3

License

Apache 2.0