Files
qwen3-8b-aimo3-tir/README.md
ModelHub XC ab17300538 初始化项目,由ModelHub XC社区提供模型
Model: tensorhydra/qwen3-8b-aimo3-tir
Source: Original Platform
2026-04-12 19:07:59 +08:00

257 lines
6.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
license: other
license_name: qwen-research
base_model: Qwen/Qwen2.5-8B
tags:
- qwen
- lora
- merged
- math
- reasoning
- tool-integrated-reasoning
- aimo
- safetensors
datasets:
- jeannkouagou/aimo3-tool-integrated-reasoning
pipeline_tag: text-generation
library_name: transformers
---
# Qwen3-8B AIMO3 Tool-Integrated Reasoning
## Model Summary
A LoRA fine-tuned version of Qwen-8B trained for **tool-integrated reasoning** on the AIMO3 competition dataset (generated by GPT-OSS-120B). The LoRA adapters have been **merged** into the base model and saved in SafeTensors format for straightforward deployment.
| Property | Details |
|---|---|
| Base Model | Qwen-8B |
| Fine-tuning Method | LoRA (merged) |
| Format | SafeTensors (BF16) |
| Parameters | ~8B |
| Disk Size | ~16GB |
| Max Context | 8192 tokens |
---
## Model Details
### LoRA Configuration
| Hyperparameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Bias | none |
| Target Modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
### Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Precision | BFloat16 (no quantization) |
| Epochs | 2 |
| Steps | 8750 (~1 epoch) |
| Per-device Batch Size | 2 |
| Gradient Accumulation Steps | 8 (effective batch: 16) |
| Learning Rate | 2e-4 |
| LR Scheduler | Cosine with warmup |
| Warmup Ratio | 0.03 |
| Weight Decay | 0.01 |
| Max Gradient Norm | 1.0 |
| Max Sequence Length | 8192 |
| Optimizer | AdamW (Fused) |
### Hardware & Infrastructure
- **Platform**: Kaggle
- **GPU**: Single NVIDIA H100 (80GB)
- **Attention**: Flash Attention 2
- **Optimizations**: Gradient checkpointing, TF32, fused optimizer
---
## Training Data
- **Dataset**: [AIMO3 Tool-Integrated Reasoning Dataset](https://www.kaggle.com/datasets/jeannkouagou/aimo3-tool-integrated-reasoning) (synthesized by GPT-OSS-120B)
- **Split**: 97.5% train / 2.5% validation
- **Format**: CSV with problemsolution pairs
**Supported column names:**
- Input: `problem`, `question`, `input`, `prompt`
- Output: `solution`, `answer`, `output`, `response`, `completion`
### Instruction Format
Training uses a ChatML-style format:
```
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>
```
### Training Loss
The model is trained for 8750 steps (~ 1 epoch) before stopping. Below are the train and validation loss curves for the entire training session.
![Training Loss Plot](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F32392416%2Ffc9c0a306042ea3b976ed8749150a48d%2Floss_plot.png?generation=1774706655261800&alt=media)
---
## Usage
### Load the Model
Since the LoRA adapters are already merged, PEFT is **not required**:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"tensorhydra/qwen-8b-aimo3-reasoning",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"tensorhydra/qwen-8b-aimo3-reasoning",
trust_remote_code=True
)
```
### Inference
```python
prompt = "Solve this problem: What is 2 + 2?"
formatted_prompt = f"user\n{prompt}\nassistant\n"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)
```
### Batch Inference
```python
prompts = [
"Solve: 15 + 27 = ?",
"What is the derivative of x^2?",
"Calculate the area of a circle with radius 5"
]
formatted_prompts = [
f"user\n{p}\nassistant\n"
for p in prompts
]
inputs = tokenizer(formatted_prompts, return_tensors="pt", padding=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
for response in tokenizer.batch_decode(outputs, skip_special_tokens=False):
print(response)
print("-" * 80)
```
### Quantized Inference (Lower VRAM)
```python
# 8-bit (~8GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
"tensorhydra/qwen-8b-aimo3-reasoning",
load_in_8bit=True,
device_map="auto"
)
# 4-bit (~4GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
"tensorhydra/qwen-8b-aimo3-reasoning",
load_in_4bit=True,
device_map="auto"
)
```
---
## Memory Requirements
| Mode | VRAM |
|---|---|
| BF16 (full) | ~16GB |
| 8-bit quantized | ~8GB |
| 4-bit quantized | ~4GB |
---
## Repository Structure
```
model/
├── config.json
├── generation_config.json
├── model.safetensors.index.json
├── model-00001-of-0000X.safetensors
├── ...
├── tokenizer_config.json
├── tokenizer.json
└── special_tokens_map.json
```
---
## Intended Use
- Mathematical reasoning and problem solving
- Tool-integrated step-by-step reasoning
- Educational and research applications
- Production deployment (merged model, no PEFT dependency)
## Limitations
- Fine-tuned on a narrow reasoning domain; may not generalize well to other tasks
- Hard context limit of 8192 tokens
- Performance is bounded by the quality and distribution of the synthetic training data
- Full merged model requires ~16GB storage (vs. ~100200MB for LoRA adapters alone)
---
## Links
- **Dataset**: [jeannkouagou/aimo3-tool-integrated-reasoning](https://www.kaggle.com/datasets/jeannkouagou/aimo3-tool-integrated-reasoning)
- **Fine-tuning Notebook**: [tensorhydra/qwen3-8b-aimo3-finetune](https://www.kaggle.com/code/tensorhydra/qwen3-8b-aimo3-finetune)
---
## Citation
```bibtex
@misc{qwen-lora-aimo3,
title = {Qwen-8B LoRA Fine-tuned for Tool-Integrated Reasoning},
author = {tensorhydra},
year = {2025},
howpublished = {Kaggle Model Hub},
note = {Merged LoRA model in SafeTensors format}
}
```
---
## Acknowledgements
- **Base model**: [Qwen-8B](https://huggingface.co/Qwen) by Alibaba Cloud
- **Training frameworks**: Hugging Face Transformers & PEFT
- **Dataset synthesis**: GPT-OSS-120B
- **Serialization**: SafeTensors
- **Training platform**: Kaggle (H100 GPU)
## License
This model inherits the license of the base Qwen-8B model. Please refer to the [Qwen license terms](https://huggingface.co/Qwen/Qwen2.5-8B/blob/main/LICENSE) before use.