194 lines
5.3 KiB
Markdown
194 lines
5.3 KiB
Markdown
|
|
---
|
||
|
|
library_name: transformers
|
||
|
|
license: apache-2.0
|
||
|
|
base_model: Qwen/Qwen3-0.6B
|
||
|
|
tags:
|
||
|
|
- text2sql
|
||
|
|
- sql
|
||
|
|
- nlp
|
||
|
|
- distillation
|
||
|
|
- qwen3
|
||
|
|
datasets:
|
||
|
|
- distil-labs/text2sql-synthetic
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
---
|
||
|
|
|
||
|
|
# Distil-Qwen3-0.6B-Text2SQL
|
||
|
|
|
||
|
|
A fine-tuned Qwen3-0.6B model for converting natural language questions into SQL queries. Trained using knowledge distillation from DeepSeek-V3, this compact 0.6B parameter model delivers strong Text2SQL performance while being extremely lightweight and fast for local deployment.
|
||
|
|
|
||
|
|
## Results
|
||
|
|
|
||
|
|
| Metric | DeepSeek-V3 (Teacher) | Qwen3-0.6B (Base) | **This Model** |
|
||
|
|
|--------|:---------------------:|:-----------------:|:--------------:|
|
||
|
|
| LLM-as-a-Judge | 76% | 36% | **74%** |
|
||
|
|
| Exact Match | 38% | 24% | **40%** |
|
||
|
|
| ROUGE | 88.6% | 69.3% | **88.5%** |
|
||
|
|
| METEOR | 90.4% | 65.3% | **88.5%** |
|
||
|
|
|
||
|
|
The fine-tuned model achieves **74% on LLM-as-a-Judge** accuracy with only 0.6B parameters - a **2x improvement** over the base model and approaching the 685B parameter teacher's performance at a fraction of the size.
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### Using Transformers
|
||
|
|
|
||
|
|
```python
|
||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
|
|
||
|
|
model = AutoModelForCausalLM.from_pretrained("distil-labs/distil-qwen3-0.6b-text2sql")
|
||
|
|
tokenizer = AutoTokenizer.from_pretrained("distil-labs/distil-qwen3-0.6b-text2sql")
|
||
|
|
|
||
|
|
schema = """CREATE TABLE employees (
|
||
|
|
id INTEGER PRIMARY KEY,
|
||
|
|
name TEXT NOT NULL,
|
||
|
|
department TEXT,
|
||
|
|
salary INTEGER
|
||
|
|
);"""
|
||
|
|
|
||
|
|
question = "How many employees earn more than 50000?"
|
||
|
|
|
||
|
|
messages = [
|
||
|
|
{
|
||
|
|
"role": "system",
|
||
|
|
"content": """You are a problem solving model working on task_description XML block:
|
||
|
|
<task_description>You are given a database schema and a natural language question. Generate the SQL query that answers the question.
|
||
|
|
|
||
|
|
Input:
|
||
|
|
- Schema: One or two table definitions in SQL DDL format
|
||
|
|
- Question: Natural language question about the data
|
||
|
|
|
||
|
|
Output:
|
||
|
|
- A single SQL query that answers the question
|
||
|
|
- No explanations, comments, or additional text
|
||
|
|
|
||
|
|
Rules:
|
||
|
|
- Use only tables and columns from the provided schema
|
||
|
|
- Use uppercase SQL keywords (SELECT, FROM, WHERE, etc.)
|
||
|
|
- Use SQLite-compatible syntax</task_description>
|
||
|
|
You will be given a single task in the question XML block
|
||
|
|
Solve only the task in question block.
|
||
|
|
Generate only the answer, do not generate anything else"""
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"role": "user",
|
||
|
|
"content": f"""Now for the real task, solve the task in question block.
|
||
|
|
Generate only the solution, do not generate anything else
|
||
|
|
<question>Schema:
|
||
|
|
{schema}
|
||
|
|
|
||
|
|
Question: {question}</question>"""
|
||
|
|
}
|
||
|
|
]
|
||
|
|
|
||
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||
|
|
inputs = tokenizer(text, return_tensors="pt")
|
||
|
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0)
|
||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||
|
|
```
|
||
|
|
|
||
|
|
### Using Ollama (GGUF version)
|
||
|
|
|
||
|
|
For local inference, use the quantized GGUF version included in this repository:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Download and create Ollama model
|
||
|
|
ollama create distil-qwen3-0.6b-text2sql -f Modelfile
|
||
|
|
|
||
|
|
# Run inference
|
||
|
|
ollama run distil-qwen3-0.6b-text2sql
|
||
|
|
```
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
| Property | Value |
|
||
|
|
|----------|-------|
|
||
|
|
| Base Model | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
|
||
|
|
| Parameters | 0.6 billion |
|
||
|
|
| Architecture | Qwen3ForCausalLM |
|
||
|
|
| Context Length | 40,960 tokens |
|
||
|
|
| Precision | bfloat16 |
|
||
|
|
| Training Data | ~10,000 synthetic examples |
|
||
|
|
| Teacher Model | DeepSeek-V3 |
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
This model was trained using the [Distil Labs](https://distillabs.ai) platform:
|
||
|
|
|
||
|
|
1. **Seed Data**: 50 hand-validated Text2SQL examples covering various SQL complexities
|
||
|
|
2. **Synthetic Generation**: Expanded to ~10,000 examples using DeepSeek-V3
|
||
|
|
3. **Fine-tuning**: 4 epochs on the synthetic dataset
|
||
|
|
4. **Evaluation**: LLM-as-a-Judge with semantic equivalence checking
|
||
|
|
|
||
|
|
### Training Hyperparameters
|
||
|
|
|
||
|
|
- Epochs: 4
|
||
|
|
- Learning Rate: 5e-5 (cosine schedule)
|
||
|
|
- Batch Size: 1 (with gradient accumulation)
|
||
|
|
- Total Steps: ~40,000
|
||
|
|
|
||
|
|
## Task Format
|
||
|
|
|
||
|
|
### Input Format
|
||
|
|
|
||
|
|
```
|
||
|
|
Schema:
|
||
|
|
CREATE TABLE table_name (
|
||
|
|
column_name DATA_TYPE [CONSTRAINTS],
|
||
|
|
...
|
||
|
|
);
|
||
|
|
|
||
|
|
Question: Natural language question about the data
|
||
|
|
```
|
||
|
|
|
||
|
|
### Output Format
|
||
|
|
|
||
|
|
A single SQL query with:
|
||
|
|
- Uppercase SQL keywords (SELECT, FROM, WHERE, etc.)
|
||
|
|
- SQLite-compatible syntax
|
||
|
|
- No explanations or additional text
|
||
|
|
|
||
|
|
### Supported SQL Features
|
||
|
|
|
||
|
|
- **Simple**: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN
|
||
|
|
- **Medium**: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT
|
||
|
|
- **Complex**: Subqueries, multiple JOINs, UNION
|
||
|
|
|
||
|
|
## Use Cases
|
||
|
|
|
||
|
|
- Natural language interfaces to databases
|
||
|
|
- SQL query assistance and autocompletion
|
||
|
|
- Database chatbots and conversational BI
|
||
|
|
- Educational tools for learning SQL
|
||
|
|
- Edge deployment and mobile applications
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- Optimized for SQLite syntax
|
||
|
|
- Best with 1-2 table schemas
|
||
|
|
- May struggle with highly complex nested subqueries
|
||
|
|
- Trained on English questions only
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
This model is released under the Apache 2.0 license.
|
||
|
|
|
||
|
|
## Links
|
||
|
|
|
||
|
|
- [Distil Labs Website](https://distillabs.ai)
|
||
|
|
- [GitHub](https://github.com/distil-labs)
|
||
|
|
- [Hugging Face](https://huggingface.co/distil-labs)
|
||
|
|
|
||
|
|
## Citation
|
||
|
|
|
||
|
|
```bibtex
|
||
|
|
@misc{distil-qwen3-0.6b-text2sql,
|
||
|
|
author = {Distil Labs},
|
||
|
|
title = {Distil-Qwen3-0.6B-Text2SQL: A Compact Fine-tuned Model for Natural Language to SQL},
|
||
|
|
year = {2025},
|
||
|
|
publisher = {Hugging Face},
|
||
|
|
url = {https://huggingface.co/distil-labs/distil-qwen3-0.6b-text2sql}
|
||
|
|
}
|
||
|
|
```
|