--- base_model: unsloth/Qwen3-0.6B-Base library_name: transformers model_name: Qwen3-0.6B-instruction-finetuned tags: - generated_from_trainer - unsloth - trl - sft licence: license datasets: - andresnowak/Instruction-finetuning-mixture-mnlp language: - en --- # Model Card for Qwen3-0.6B-instruction-finetuned This model is a fine-tuned version of [unsloth/Qwen3-0.6B-Base](https://huggingface.co/unsloth/Qwen3-0.6B-Base). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="andresnowak/Qwen3-0.6B-instruction-finetuned", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure This model was done using Language modelling (loss done on prompt and completion) Supervised instruction finetuning and this model was also trained by applying some ranom templates as to be able to have more robustness as how questions will be asked apart from the dataest already bein high quality and having a lot of this examples, this was done as we weren't allowed to use chat templates for the evaluation. But this model probably had two problems during training, one being that we didn't filter the dataset to just have examples that combined (prompt and completion) have a size of 2048 (the max size we are using) and instead doing a truncation. Also this model uses left side padding in the tokenizer as flash-attention 2 needs this ```yaml environment: seed: 42 use_template: True model: name: Qwen/Qwen3-0.6B-Base hub_model_id: andresnowak/Qwen3-0.6B-instruction-finetuned dataset: - name: andresnowak/Instruction-finetuning-mixture-mnlp config: codeAlpaca size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: noRobots size: 0.8 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: openMathGsm8k size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: codeV2 size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: flanV2 size: 0.8 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: ifData size: 0.8 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: mathAlgebra size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: mathGrade size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: oasst1 size: 0.6 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: sciriff size: 0.8 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: tableGpt size: 0.3 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: tirMath size: 0.4 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: wildChat size: 0.7 - name: andresnowak/Instruction-finetuning-mixture-mnlp config: mathV5 size: 0.2 dataset_evaluation: - name: cais/mmlu config: validation subjects: ["abstract_algebra", "anatomy", "astronomy", "college_biology", "college_chemistry", "college_computer_science", "college_mathematics", "college_physics", "computer_security", "conceptual_physics", "electrical_engineering", "elementary_mathematics", "high_school_biology", "high_school_chemistry", "high_school_computer_science", "high_school_mathematics", "high_school_physics", "high_school_statistics", "machine_learning"] training: learning_rate: 1e-5 per_device_train_batch_size: 16 per_device_eval_batch_size: 16 gradient_accumulation_steps: 8 num_train_epochs: 2 weight_decay: 0.00 warmup_ratio: 0.03 max_grad_norm: 0.5 lr_scheduler: "linear" ``` This model was trained with SFT. ## Evaluation results The performance is as follows: | Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) | | :----------------- | :------------- | :----------------------------- | | ARC Challenge | 46.0% | 45.3% | | ARC Easy | 59.3% | 54.2% | | GPQA | 29.9% | 27.0% | | Math QA | 24.0% | 24.8% | | MCQA Evals | 37.9% | 34.9% | | MMLU | 47.2% | 47.2% | | MMLU Pro | 13.2% | 12.0% | | MuSR | 43.5% | 42.1% | | NLP4Education | 38.8% | 36.5% | | **Overall** | **37.8%** | **36.0%** | The tests where done with this prompt (And only MusR used a different one where you add the Question: and Narrative: ) ``` This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer. --- [Insert Question Here] --- [Insert Choices Here, e.g.: A. Option 1 B. Option 2 C. Option 3 D. Option 4] --- Your response should include the letter and the exact text of the correct choice. Example: B. Entropy increases. Answer: ``` And the teseting was done on ``` [Letter]. [Text answer]``` ### Framework versions - TRL: 0.15.2 - Transformers: 4.51.3 - Pytorch: 2.5.1+cu121 - Datasets: 3.6.0 - Tokenizers: 0.21.0 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```