--- language: - en license: other license_name: qwen-research base_model: Qwen/Qwen2.5-8B tags: - qwen - lora - merged - math - reasoning - tool-integrated-reasoning - aimo - safetensors datasets: - jeannkouagou/aimo3-tool-integrated-reasoning pipeline_tag: text-generation library_name: transformers --- # Qwen3-8B AIMO3 Tool-Integrated Reasoning ## Model Summary A LoRA fine-tuned version of Qwen-8B trained for **tool-integrated reasoning** on the AIMO3 competition dataset (generated by GPT-OSS-120B). The LoRA adapters have been **merged** into the base model and saved in SafeTensors format for straightforward deployment. | Property | Details | |---|---| | Base Model | Qwen-8B | | Fine-tuning Method | LoRA (merged) | | Format | SafeTensors (BF16) | | Parameters | ~8B | | Disk Size | ~16GB | | Max Context | 8192 tokens | --- ## Model Details ### LoRA Configuration | Hyperparameter | Value | |---|---| | Rank (r) | 16 | | Alpha | 32 | | Dropout | 0.05 | | Bias | none | | Target Modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | ### Training Hyperparameters | Hyperparameter | Value | |---|---| | Precision | BFloat16 (no quantization) | | Epochs | 2 | | Steps | 8750 (~1 epoch) | | Per-device Batch Size | 2 | | Gradient Accumulation Steps | 8 (effective batch: 16) | | Learning Rate | 2e-4 | | LR Scheduler | Cosine with warmup | | Warmup Ratio | 0.03 | | Weight Decay | 0.01 | | Max Gradient Norm | 1.0 | | Max Sequence Length | 8192 | | Optimizer | AdamW (Fused) | ### Hardware & Infrastructure - **Platform**: Kaggle - **GPU**: Single NVIDIA H100 (80GB) - **Attention**: Flash Attention 2 - **Optimizations**: Gradient checkpointing, TF32, fused optimizer --- ## Training Data - **Dataset**: [AIMO3 Tool-Integrated Reasoning Dataset](https://www.kaggle.com/datasets/jeannkouagou/aimo3-tool-integrated-reasoning) (synthesized by GPT-OSS-120B) - **Split**: 97.5% train / 2.5% validation - **Format**: CSV with problem–solution pairs **Supported column names:** - Input: `problem`, `question`, `input`, `prompt` - Output: `solution`, `answer`, `output`, `response`, `completion` ### Instruction Format Training uses a ChatML-style format: ``` <|im_start|>user {prompt}<|im_end|> <|im_start|>assistant {response}<|im_end|> ``` ### Training Loss The model is trained for 8750 steps (~ 1 epoch) before stopping. Below are the train and validation loss curves for the entire training session. ![Training Loss Plot](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F32392416%2Ffc9c0a306042ea3b976ed8749150a48d%2Floss_plot.png?generation=1774706655261800&alt=media) --- ## Usage ### Load the Model Since the LoRA adapters are already merged, PEFT is **not required**: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "tensorhydra/qwen-8b-aimo3-reasoning", trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained( "tensorhydra/qwen-8b-aimo3-reasoning", trust_remote_code=True ) ``` ### Inference ```python prompt = "Solve this problem: What is 2 + 2?" formatted_prompt = f"user\n{prompt}\nassistant\n" inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True ) response = tokenizer.decode(outputs[0], skip_special_tokens=False) print(response) ``` ### Batch Inference ```python prompts = [ "Solve: 15 + 27 = ?", "What is the derivative of x^2?", "Calculate the area of a circle with radius 5" ] formatted_prompts = [ f"user\n{p}\nassistant\n" for p in prompts ] inputs = tokenizer(formatted_prompts, return_tensors="pt", padding=True).to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True) for response in tokenizer.batch_decode(outputs, skip_special_tokens=False): print(response) print("-" * 80) ``` ### Quantized Inference (Lower VRAM) ```python # 8-bit (~8GB VRAM) model = AutoModelForCausalLM.from_pretrained( "tensorhydra/qwen-8b-aimo3-reasoning", load_in_8bit=True, device_map="auto" ) # 4-bit (~4GB VRAM) model = AutoModelForCausalLM.from_pretrained( "tensorhydra/qwen-8b-aimo3-reasoning", load_in_4bit=True, device_map="auto" ) ``` --- ## Memory Requirements | Mode | VRAM | |---|---| | BF16 (full) | ~16GB | | 8-bit quantized | ~8GB | | 4-bit quantized | ~4GB | --- ## Repository Structure ``` model/ ├── config.json ├── generation_config.json ├── model.safetensors.index.json ├── model-00001-of-0000X.safetensors ├── ... ├── tokenizer_config.json ├── tokenizer.json └── special_tokens_map.json ``` --- ## Intended Use - Mathematical reasoning and problem solving - Tool-integrated step-by-step reasoning - Educational and research applications - Production deployment (merged model, no PEFT dependency) ## Limitations - Fine-tuned on a narrow reasoning domain; may not generalize well to other tasks - Hard context limit of 8192 tokens - Performance is bounded by the quality and distribution of the synthetic training data - Full merged model requires ~16GB storage (vs. ~100–200MB for LoRA adapters alone) --- ## Links - **Dataset**: [jeannkouagou/aimo3-tool-integrated-reasoning](https://www.kaggle.com/datasets/jeannkouagou/aimo3-tool-integrated-reasoning) - **Fine-tuning Notebook**: [tensorhydra/qwen3-8b-aimo3-finetune](https://www.kaggle.com/code/tensorhydra/qwen3-8b-aimo3-finetune) --- ## Citation ```bibtex @misc{qwen-lora-aimo3, title = {Qwen-8B LoRA Fine-tuned for Tool-Integrated Reasoning}, author = {tensorhydra}, year = {2025}, howpublished = {Kaggle Model Hub}, note = {Merged LoRA model in SafeTensors format} } ``` --- ## Acknowledgements - **Base model**: [Qwen-8B](https://huggingface.co/Qwen) by Alibaba Cloud - **Training frameworks**: Hugging Face Transformers & PEFT - **Dataset synthesis**: GPT-OSS-120B - **Serialization**: SafeTensors - **Training platform**: Kaggle (H100 GPU) ## License This model inherits the license of the base Qwen-8B model. Please refer to the [Qwen license terms](https://huggingface.co/Qwen/Qwen2.5-8B/blob/main/LICENSE) before use.