176 lines
5.9 KiB
Markdown
176 lines
5.9 KiB
Markdown
---
|
|
base_model: unsloth/Qwen3-0.6B-Base
|
|
library_name: transformers
|
|
model_name: Qwen3-0.6B-instruction-finetuned
|
|
tags:
|
|
- generated_from_trainer
|
|
- unsloth
|
|
- trl
|
|
- sft
|
|
licence: license
|
|
datasets:
|
|
- andresnowak/Instruction-finetuning-mixture-mnlp
|
|
language:
|
|
- en
|
|
---
|
|
|
|
# Model Card for Qwen3-0.6B-instruction-finetuned
|
|
|
|
This model is a fine-tuned version of [unsloth/Qwen3-0.6B-Base](https://huggingface.co/unsloth/Qwen3-0.6B-Base).
|
|
It has been trained using [TRL](https://github.com/huggingface/trl).
|
|
|
|
## Quick start
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
|
generator = pipeline("text-generation", model="andresnowak/Qwen3-0.6B-instruction-finetuned", device="cuda")
|
|
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
|
print(output["generated_text"])
|
|
```
|
|
|
|
## Training procedure
|
|
|
|
This model was done using Language modelling (loss done on prompt and completion) Supervised instruction finetuning and this model was also trained by applying some ranom templates
|
|
as to be able to have more robustness as how questions will be asked apart from the dataest already bein high quality and having a lot of this examples, this was done as we weren't
|
|
allowed to use chat templates for the evaluation.
|
|
But this model probably had two problems during training, one being that we didn't filter the dataset to just have examples that combined (prompt and completion) have a size of 2048 (the max size we are using) and instead
|
|
doing a truncation. Also this model uses left side padding in the tokenizer as flash-attention 2 needs this
|
|
|
|
```yaml
|
|
|
|
environment:
|
|
seed: 42
|
|
use_template: True
|
|
|
|
model:
|
|
name: Qwen/Qwen3-0.6B-Base
|
|
hub_model_id: andresnowak/Qwen3-0.6B-instruction-finetuned
|
|
|
|
dataset:
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: codeAlpaca
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: noRobots
|
|
size: 0.8
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: openMathGsm8k
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: codeV2
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: flanV2
|
|
size: 0.8
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: ifData
|
|
size: 0.8
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: mathAlgebra
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: mathGrade
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: oasst1
|
|
size: 0.6
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: sciriff
|
|
size: 0.8
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: tableGpt
|
|
size: 0.3
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: tirMath
|
|
size: 0.4
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: wildChat
|
|
size: 0.7
|
|
- name: andresnowak/Instruction-finetuning-mixture-mnlp
|
|
config: mathV5
|
|
size: 0.2
|
|
|
|
dataset_evaluation:
|
|
- name: cais/mmlu
|
|
config: validation
|
|
subjects: ["abstract_algebra", "anatomy", "astronomy", "college_biology", "college_chemistry", "college_computer_science", "college_mathematics", "college_physics", "computer_security", "conceptual_physics", "electrical_engineering", "elementary_mathematics", "high_school_biology", "high_school_chemistry", "high_school_computer_science", "high_school_mathematics", "high_school_physics", "high_school_statistics", "machine_learning"]
|
|
|
|
training:
|
|
learning_rate: 1e-5
|
|
per_device_train_batch_size: 16
|
|
per_device_eval_batch_size: 16
|
|
gradient_accumulation_steps: 8
|
|
num_train_epochs: 2
|
|
weight_decay: 0.00
|
|
warmup_ratio: 0.03
|
|
max_grad_norm: 0.5
|
|
lr_scheduler: "linear"
|
|
```
|
|
|
|
|
|
This model was trained with SFT.
|
|
|
|
## Evaluation results
|
|
|
|
The performance is as follows:
|
|
|
|
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) |
|
|
| :----------------- | :------------- | :----------------------------- |
|
|
| ARC Challenge | 46.0% | 45.3% |
|
|
| ARC Easy | 59.3% | 54.2% |
|
|
| GPQA | 29.9% | 27.0% |
|
|
| Math QA | 24.0% | 24.8% |
|
|
| MCQA Evals | 37.9% | 34.9% |
|
|
| MMLU | 47.2% | 47.2% |
|
|
| MMLU Pro | 13.2% | 12.0% |
|
|
| MuSR | 43.5% | 42.1% |
|
|
| NLP4Education | 38.8% | 36.5% |
|
|
| **Overall** | **37.8%** | **36.0%** |
|
|
|
|
The tests where done with this prompt (And only MusR used a different one where you add the Question: and Narrative: )
|
|
```
|
|
This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer.
|
|
|
|
---
|
|
[Insert Question Here]
|
|
---
|
|
[Insert Choices Here, e.g.:
|
|
A. Option 1
|
|
B. Option 2
|
|
C. Option 3
|
|
D. Option 4]
|
|
---
|
|
|
|
Your response should include the letter and the exact text of the correct choice.
|
|
Example: B. Entropy increases.
|
|
Answer:
|
|
```
|
|
|
|
And the teseting was done on ``` [Letter]. [Text answer]```
|
|
|
|
### Framework versions
|
|
|
|
- TRL: 0.15.2
|
|
- Transformers: 4.51.3
|
|
- Pytorch: 2.5.1+cu121
|
|
- Datasets: 3.6.0
|
|
- Tokenizers: 0.21.0
|
|
|
|
## Citations
|
|
|
|
|
|
|
|
Cite TRL as:
|
|
|
|
```bibtex
|
|
@misc{vonwerra2022trl,
|
|
title = {{TRL: Transformer Reinforcement Learning}},
|
|
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
|
year = 2020,
|
|
journal = {GitHub repository},
|
|
publisher = {GitHub},
|
|
howpublished = {\url{https://github.com/huggingface/trl}}
|
|
}
|
|
``` |