初始化项目,由ModelHub XC社区提供模型

Model: reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-13 10:41:16 +08:00
commit 4b45599a57
8 changed files with 557 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

322
README.md Normal file
View File

@@ -0,0 +1,322 @@
---
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3-0.6B
datasets:
- 0xZee/dataset-CoT-Advanced-Calculus-268
- 0xZee/dataset-CoT-Modern-Physics-177
- 0xZee/dataset-CoT-Theoretical-Mechanics-307
- 0xZee/dataset-CoT-Linear-Algebra-667
- 0xZee/dataset-CoT-Electromagnetism-580
- 0xZee/dataset-CoT-Molecular-Biology-71
- 0xZee/dataset-CoT-Physiology-114
- 0xZee/dataset-CoT-Classical-Mechanics-343
- 0xZee/dataset-CoT-Differential-Equations-636
- 0xZee/dataset-CoT-Physics-2254
- 0xZee/dataset-CoT-Engineering-574
- 0xZee/dataset-CoT-mathematics
- Alignment-Lab-AI/Lawyer-Instruct
tags:
- causal-lm
- text-generation
- distillation
- knowledge-distillation
- sft
- reasoning
- chain-of-thought
- mathematics
- physics
- engineering
- legal
- stem
- convergentintel
- edge
---
# Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
A 0.6B parameter model built in two stages: knowledge distillation from a 30B Thinking teacher to establish a structured reasoning backbone, then supervised fine-tuning on legal instruction data. 50x compression. Under 500MB quantized. Runs on a phone.
The training order is the thesis: teach the model *how to reason* first (distillation from Thinking teacher), then teach it *what to reason about* (legal SFT). The Thinking teacher's extended deliberation traces transfer deeper reasoning structure than an Instruct teacher — critical when the student has only 0.6B parameters to work with.
> *"Structure beats scale, collaboration beats hierarchy, observation beats theory."*
> — Convergent Intelligence LLC: Research Division
## Training Pipeline
### Stage 1: Knowledge Distillation (STEM Reasoning Backbone)
Qwen3-0.6B distilled from [Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) — a Mixture-of-Experts model with 30B total parameters, ~3B active per token, using the Thinking variant that generates extended internal reasoning traces.
**Why the Thinking teacher matters at 0.6B:** The Thinking variant produces higher-entropy softmax distributions than the Instruct variant — it considers more reasoning paths before committing. At distillation temperature T=2.0, the 0.6B student sees a richer landscape of alternative derivation strategies. With only 0.6B parameters, every bit of transferred structure counts. The Thinking teacher gives more.
**Data:** 6,122 STEM chain-of-thought samples across 12 domains:
| Domain | Samples |
|---|---|
| Physics | 2,254 |
| Linear Algebra | 667 |
| Differential Equations | 636 |
| Electromagnetism | 580 |
| Mathematics | 576 |
| Engineering | 574 |
| Classical Mechanics | 343 |
| Theoretical Mechanics | 307 |
| Advanced Calculus | 268 |
| Modern Physics | 177 |
| Physiology | 114 |
| Molecular Biology | 71 |
All from [0xZee](https://huggingface.co/0xZee). Shuffled seed 42, split 95/5 train/eval.
**Loss function:**
1. **Proof-Weighted Cross-Entropy (55%)** — 2.5x weight on derivation tokens, decaying to 1.5x. Forces the student to allocate its limited capacity to reasoning steps, not answer formatting.
2. **Knowledge Distillation KL Divergence (45%)** — T=2.0, scaled by T². Transfers the Thinking teacher's full deliberation landscape.
**Training format:**
```
Solve the following problem carefully and show a rigorous derivation.
Problem:
{question}
Proof:
{CoT}
Final Answer:
{response}
```
**Stage 1 hyperparameters:**
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Training samples | 5,815 |
| Effective batch size | 8 |
| Learning rate | 1.5e-5 → 1e-6 (cosine) |
| Temperature | 2.0 |
| Proof weight | 2.5 → 1.5 |
| Precision | bf16 |
---
### Stage 2: Supervised Fine-Tuning (Legal Domain)
The distilled model was fine-tuned on [Alignment-Lab-AI/Lawyer-Instruct](https://huggingface.co/datasets/Alignment-Lab-AI/Lawyer-Instruct) using TRL's SFTTrainer.
**Why legal on top of STEM:** Legal reasoning is structurally isomorphic to mathematical reasoning — premise identification, logical chaining, exception handling, structured argumentation toward a conclusion. A model that learned rigorous derivation transfers that structure to legal analysis rather than learning legal templates from scratch.
**Training format:**
```
### Instruction:
{instruction}
### Response:
{output}
```
**Stage 2 hyperparameters:**
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Effective batch size | 8 |
| Learning rate | 5e-6 (lower than Stage 1 to preserve backbone) |
| Gradient checkpointing | Enabled |
| Precision | bf16 |
## Model Details
| Attribute | Value |
|---|---|
| **Architecture** | Qwen3 (causal LM, RoPE, GQA) |
| **Parameters** | 0.6B |
| **Base model** | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
| **Teacher model** | [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) |
| **Compression ratio** | 50x |
| **Stage 1 data** | 6,122 STEM CoT samples (12 datasets) |
| **Stage 2 data** | [Alignment-Lab-AI/Lawyer-Instruct](https://huggingface.co/datasets/Alignment-Lab-AI/Lawyer-Instruct) |
| **Context length** | 1024 tokens (training) |
| **License** | Apache 2.0 |
| **Developer** | Reaperdoesntrun / [Convergent Intelligence LLC](https://convergentintel.com): Research Division |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto",
)
# Legal instruction-following
prompt = """### Instruction:
What is the difference between a felony and a misdemeanor?
### Response:
"""
# STEM derivation (Stage 1 format still works)
prompt_stem = """Solve the following problem carefully and show a rigorous derivation.
Problem:
Compute the determinant of the matrix [[1, 2], [3, 4]].
Proof:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### GGUF
Quantized versions at [reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF).
## Prompt Formats
**STEM derivation (Stage 1):**
```
Solve the following problem carefully and show a rigorous derivation.
Problem:
[Your problem]
Proof:
```
**Instruction-following (Stage 2):**
```
### Instruction:
[Your question]
### Response:
```
## Intended Uses
**Good for:** Ultra-lightweight reasoning on mobile/edge/IoT, legal and STEM instruction-following, educational tutoring, embedded inference, component in multi-model pipelines, anywhere you need reasoning in under 500MB.
**Not for:** Formal proof verification, actual legal counsel, safety-critical analysis, complex multi-step proofs (>8 steps), or long-context tasks beyond 1024 tokens.
## Limitations
0.6B is a hard capacity constraint. The model trades depth for deployability. It will make reasoning errors that a larger model would not. Multi-step derivations beyond ~8 steps degrade. Legal reasoning covers general concepts but lacks the nuance of larger models. Performance is weakest on underrepresented domains (molecular biology, physiology). Always verify outputs.
## Mathematical Foundations: Discrepancy Calculus (DISC)
This model is part of a distillation chain built on Discrepancy Calculus — a measure-theoretic framework where the teacher's output distribution is decomposed via the Mesh Fundamental Identity into smooth (AC), jump, and Cantor components. The discrepancy operator $Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|} dt$ quantifies local structural mismatch that standard KL divergence averages away.
Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division). Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165).
## Related Models
| Model | Description |
|---|---|
| [Qwen3-0.6B-STEM-Proof-Distilled-Thinking](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-STEM-Proof-Distilled-Thinking) | Stage 1 only — pure STEM backbone |
| [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF) | This model quantized for edge deployment |
| [Qwen3-1.7B-STEM-Proof-Distilled](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-STEM-Proof-Distilled) | Larger 1.7B variant (Instruct teacher) |
| [Qwen3-1.7B-Distilled-30B-A3B-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT) | Larger 1.7B variant + legal SFT |
## Citation
```bibtex
@misc{colca2026thinking06bsft,
title={Two-Stage Reasoning Transfer at 0.6B: Thinking Teacher Distillation + Legal SFT},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT},
note={Convergent Intelligence LLC: Research Division}
}
```
---
*Convergent Intelligence LLC: Research Division*
*"Where classical analysis fails to see, we begin."*
---
## Convergent Intelligence Portfolio
*Part of the [Qwen3 0.6B Distillation Series](https://huggingface.co/reaperdoesntknow) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
#
## Mathematical Foundations: Discrepancy Calculus (DISC)
This model is part of a distillation chain built on Discrepancy Calculus — a measure-theoretic framework where the teacher's output distribution is decomposed via the Mesh Fundamental Identity into smooth (AC), jump, and Cantor components. The discrepancy operator $Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|} dt$ quantifies local structural mismatch that standard KL divergence averages away.
Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division). Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165).
## Related Models
| Model | Downloads | Format |
|-------|-----------|--------|
| [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) | 36 | HF |
| [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF) | 203 | GGUF |
### Top Models from Our Lab
| Model | Downloads |
|-------|-----------|
| [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | 501 |
| [LFM2.5-1.2B-Distilled-SFT](https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT) | 342 |
| [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | 302 |
| [Qwen3-1.7B-Coder-Distilled-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF) | 194 |
| [Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF) | 175 |
**Total Portfolio: 41 models | 2,781 total downloads**
*Last updated: 2026-03-28 12:56 UTC*
<!-- DISTILQWEN-SPOTLIGHT-START -->
## DistilQwen Collection
This model is part of the **[DistilQwen](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** proof-weighted distillation series.
Collection: **9 models** | **2,788 downloads**
### Teacher Variant Comparison
| Teacher | Student Size | Strength | Models |
|---------|-------------|----------|--------|
| Qwen3-30B-A3B (Instruct) | 1.7B | Instruction following, structured output, legal reasoning | 3 (833 DL) |
| Qwen3-30B-A3B (Thinking) | 0.6B | Extended deliberation, higher-entropy distributions, proof derivation | 3 (779 DL) **← this model** |
| Qwen3-30B-A3B (Coder) | 1.7B | Structured decomposition, STEM derivation, logical inference | 2 (825 DL) |
### Methodology
**The only BF16 collection in the portfolio.** While the broader Convergent Intelligence catalog (43 models, 12,000+ downloads) was trained on CPU at FP32 for $24 total compute, the DistilQwen series was trained on H100 at BF16 with a 30B-parameter teacher. Same methodology, premium hardware. This is what happens when you give the pipeline real compute.
All models use proof-weighted knowledge distillation: 55% cross-entropy with decaying proof weights (2.5× → 1.5×), 45% KL divergence at T=2.0. The proof weight amplifies loss on reasoning-critical tokens, forcing the student to allocate capacity to structural understanding rather than surface-level pattern matching.
Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165)
### Related in this series
- [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) (236 downloads)
- [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF) (316 downloads)
<!-- DISTILQWEN-SPOTLIGHT-END -->
<!-- cix-keeper-ts:2026-06-12T13:16:20Z -->
<!-- card-refresh: 2026-03-30 -->

89
chat_template.jinja Normal file
View File

@@ -0,0 +1,89 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}

63
config.json Normal file
View File

@@ -0,0 +1,63 @@
{
"architectures": [
"Qwen3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": null,
"dtype": "bfloat16",
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_types": [
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention",
"full_attention"
],
"max_position_embeddings": 40960,
"max_window_layers": 28,
"model_type": "qwen3",
"num_attention_heads": 16,
"num_hidden_layers": 28,
"num_key_value_heads": 8,
"pad_token_id": 151643,
"rms_norm_eps": 1e-06,
"rope_parameters": {
"rope_theta": 1000000,
"rope_type": "default"
},
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "5.0.0",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

12
generation_config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.6,
"top_k": 20,
"top_p": 0.95,
"transformers_version": "5.0.0"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c1d2b15ffb4239c2d83a24ca59e35b8cb427a03dcdc3c667bba985e64907b258
size 1503300328

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
size 11422650

29
tokenizer_config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"add_prefix_space": false,
"backend": "tokenizers",
"bos_token": null,
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"is_local": false,
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}