初始化项目,由ModelHub XC社区提供模型
Model: jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
training_metrics.png filter=lfs diff=lfs merge=lfs -text
|
||||
123
README.md
Normal file
123
README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-1.5B
|
||||
tags:
|
||||
- reinforcement-learning
|
||||
- rloo
|
||||
- math-reasoning
|
||||
- pipelinerl
|
||||
datasets:
|
||||
- gsm8k_train
|
||||
- math_train
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Qwen2.5-1.5B-RLOO-math-reasoning
|
||||
|
||||
This model is a fine-tuned version of [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) using **RLOO (REINFORCE Leave-One-Out) without KL penalty** for mathematical reasoning.
|
||||
|
||||
Trained with [PipelineRL](https://github.com/ServiceNow/PipelineRL).
|
||||
|
||||
## Training Details
|
||||
|
||||
### Datasets
|
||||
|
||||
| Split | Datasets |
|
||||
|-------|----------|
|
||||
| Train | `gsm8k_train`, `math_train` |
|
||||
| Test | `gsm8k_test`, `math_500` |
|
||||
|
||||
### RL Algorithm
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Algorithm | RLOO (REINFORCE Leave-One-Out) |
|
||||
| Advantage Baseline | Leave-one-out mean reward over the group |
|
||||
| Extra Inference | None |
|
||||
| Group Structure | Required |
|
||||
| Policy Loss | `reinforce` |
|
||||
| KL Coefficient | `0.0` |
|
||||
| Epsilon (clip) | `0.02` |
|
||||
| Discount Factor (`gamma`) | `1.0` |
|
||||
| Divide Advantage by Std | `False` |
|
||||
| Filter Zero Advantage Groups | `False` |
|
||||
| Rollouts per Problem | `16` |
|
||||
|
||||
RLOO uses the leave-one-out mean of the other responses in the group as the baseline, trained with a REINFORCE-style policy loss.
|
||||
|
||||
### Training Hyperparameters
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Base Model | `Qwen/Qwen2.5-1.5B` |
|
||||
| Learning Rate | `1e-06` |
|
||||
| LR Scheduler | `cosine` |
|
||||
| Warmup Steps | `25` |
|
||||
| Max Training Steps | `1500` |
|
||||
| Micro Batch Size | `4` |
|
||||
| Gradient Accumulation | `64` |
|
||||
| Effective Batch Size | `256` |
|
||||
| Sequence Length | `8192` |
|
||||
| Gradient Clipping | `0.3` |
|
||||
| Weight Decay | `0.01` |
|
||||
| Optimizer | `adamw_torch` |
|
||||
| Precision | `bf16` |
|
||||
| DeepSpeed | ZeRO Stage 3 |
|
||||
|
||||
## Evaluation Results
|
||||
|
||||
Pass@k on math reasoning benchmarks (N=32 samples per problem, temperature=1.0):
|
||||
|
||||
| Dataset | pass@1 | pass@2 | pass@4 | pass@8 | pass@16 | pass@32 |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| GSM8K (test) | 78.44 | 85.37 | 89.97 | 92.93 | 94.80 | 96.06 |
|
||||
| MATH-500 | 60.14 | 68.63 | 75.63 | 81.47 | 86.24 | 89.80 |
|
||||
| **Overall** | **73.41** | **80.77** | **86.03** | **89.78** | **92.45** | **94.34** |
|
||||
|
||||
*GSM8K test: 1319 problems · MATH-500: 500 problems · Overall: 1819 problems (overall weighted by problem count).*
|
||||
|
||||
## Training Curves
|
||||
|
||||

|
||||
|
||||
## W&B Run
|
||||
|
||||
Full training logs: [https://wandb.ai/jaygala24-team/rl-post-training/runs/qwen2.5_1.5b_rloo_no_kl_3a1f_4xh100_236657_finetune_27b80841](https://wandb.ai/jaygala24-team/rl-post-training/runs/qwen2.5_1.5b_rloo_no_kl_3a1f_4xh100_236657_finetune_27b80841)
|
||||
|
||||
## Usage
|
||||
|
||||
### Transformers
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning", revision="step-0200") # optional branch, e.g. "step-0400"
|
||||
tokenizer = AutoTokenizer.from_pretrained("jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning", revision="step-0200")
|
||||
|
||||
prompt = "Please reason step by step, and put your final answer within \\boxed{}.\n\nWhat is the sum of 123 and 456?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
### vLLM
|
||||
|
||||
```python
|
||||
from vllm import LLM, SamplingParams
|
||||
|
||||
llm = LLM(model="jaygala24/Qwen2.5-1.5B-RLOO-math-reasoning", revision="step-0200") # optional branch, e.g. "step-0400"
|
||||
sampling_params = SamplingParams(temperature=0.7, max_tokens=4096)
|
||||
|
||||
prompt = "Please reason step by step, and put your final answer within \boxed{}.
|
||||
|
||||
What is the sum of 123 and 456?"
|
||||
outputs = llm.generate([prompt], sampling_params)
|
||||
print(outputs[0].outputs[0].text)
|
||||
```
|
||||
|
||||
## Framework
|
||||
|
||||
- [PipelineRL](https://github.com/ServiceNow/PipelineRL)
|
||||
- [Transformers](https://github.com/huggingface/transformers)
|
||||
- [DeepSpeed](https://github.com/microsoft/DeepSpeed) (ZeRO Stage 3)
|
||||
24
added_tokens.json
Normal file
24
added_tokens.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"</tool_call>": 151658,
|
||||
"<tool_call>": 151657,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
54
chat_template.jinja
Normal file
54
chat_template.jinja
Normal file
@@ -0,0 +1,54 @@
|
||||
{%- if tools %}
|
||||
{{- '<|im_start|>system\n' }}
|
||||
{%- if messages[0]['role'] == 'system' %}
|
||||
{{- messages[0]['content'] }}
|
||||
{%- else %}
|
||||
{{- 'You are a helpful assistant.' }}
|
||||
{%- endif %}
|
||||
{{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
||||
{%- for tool in tools %}
|
||||
{{- "\n" }}
|
||||
{{- tool | tojson }}
|
||||
{%- endfor %}
|
||||
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
||||
{%- else %}
|
||||
{%- if messages[0]['role'] == 'system' %}
|
||||
{{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
|
||||
{%- else %}
|
||||
{{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- for message in messages %}
|
||||
{%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
|
||||
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
||||
{%- elif message.role == "assistant" %}
|
||||
{{- '<|im_start|>' + message.role }}
|
||||
{%- if message.content %}
|
||||
{{- '\n' + message.content }}
|
||||
{%- endif %}
|
||||
{%- for tool_call in message.tool_calls %}
|
||||
{%- if tool_call.function is defined %}
|
||||
{%- set tool_call = tool_call.function %}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_call>\n{"name": "' }}
|
||||
{{- tool_call.name }}
|
||||
{{- '", "arguments": ' }}
|
||||
{{- tool_call.arguments | tojson }}
|
||||
{{- '}\n</tool_call>' }}
|
||||
{%- endfor %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- elif message.role == "tool" %}
|
||||
{%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
|
||||
{{- '<|im_start|>user' }}
|
||||
{%- endif %}
|
||||
{{- '\n<tool_response>\n' }}
|
||||
{{- message.content }}
|
||||
{{- '\n</tool_response>' }}
|
||||
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
||||
{{- '<|im_end|>\n' }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if add_generation_prompt %}
|
||||
{{- '<|im_start|>assistant\n' }}
|
||||
{%- endif %}
|
||||
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"architectures": [
|
||||
"Qwen2ForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 1536,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 8960,
|
||||
"max_position_embeddings": 131072,
|
||||
"max_window_layers": 28,
|
||||
"model_type": "qwen2",
|
||||
"num_attention_heads": 12,
|
||||
"num_hidden_layers": 28,
|
||||
"num_key_value_heads": 2,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000.0,
|
||||
"sliding_window": 131072,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.52.4",
|
||||
"use_cache": true,
|
||||
"use_mrope": false,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151643,
|
||||
"max_new_tokens": 2048,
|
||||
"transformers_version": "4.52.4"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1bba3a31cde4f3fd84e3592074aa02eb722b748d4b44aaa8782b7522d3a62886
|
||||
size 3087467144
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
||||
size 11421896
|
||||
207
tokenizer_config.json
Normal file
207
tokenizer_config.json
Normal file
@@ -0,0 +1,207 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
287
training_config.yaml
Normal file
287
training_config.yaml
Normal file
@@ -0,0 +1,287 @@
|
||||
finetune:
|
||||
data: null
|
||||
model_class: causal-language-modeling
|
||||
config_name: ${..model_path}
|
||||
optim: adamw_torch
|
||||
load_as_bf16: true
|
||||
fp32_lm_head: ${..fp32_lm_head}
|
||||
fp32_layer_prefix: ${..fp32_layer_prefix}
|
||||
use_flash_attention: true
|
||||
attn_implementation: flash_attention_2
|
||||
auto_device_map: false
|
||||
lora:
|
||||
enabled: false
|
||||
task_type: CAUSAL_LM
|
||||
base_model_8bit: false
|
||||
base_model_4bit: false
|
||||
r: 16
|
||||
alpha: 16
|
||||
dropout: 0.05
|
||||
bias: none
|
||||
target_modules: []
|
||||
force_restart: ${..force_restart}
|
||||
resume_dataloader: false
|
||||
train_batch_size: 4
|
||||
valid_batch_size: 4
|
||||
weight_decay: 0.01
|
||||
learning_rate: 1.0e-06
|
||||
gradient_clipping_threshold: 0.3
|
||||
lr_scheduler_type: cosine
|
||||
num_warmup_steps: 25
|
||||
gradient_accumulation_passes: 64
|
||||
gradient_checkpointing: true
|
||||
reentrant_checkpointing: false
|
||||
max_train_steps: 1500
|
||||
interrupt_train_steps: -1
|
||||
max_eval_steps: -1
|
||||
seq_length: 8192
|
||||
seq_packing: true
|
||||
output_dir: ${..output_dir}/finetune
|
||||
seed: ${..seed}
|
||||
save_checkpoint_steps: 100
|
||||
keep_intermediate_checkpoints: true
|
||||
trust_remote_code: false
|
||||
cuda_empty_cache: true
|
||||
sft_config_name: null
|
||||
n_examples: 0
|
||||
log_each_n_steps: 1
|
||||
also_save_steps: []
|
||||
use_safetensors: true
|
||||
save_final_training_state: true
|
||||
seq_parallel: 1
|
||||
objective: rl
|
||||
input: training_data
|
||||
send_weight_updates: true
|
||||
queue_size: 32
|
||||
max_lag: null
|
||||
weight_update_interval: 1
|
||||
pop_old_data: ${..pop_old_data}
|
||||
attempts: 8
|
||||
eval_callback:
|
||||
_target_: pipelinerl.finetune.utils.dummy_eval_callback
|
||||
config_name: ''
|
||||
rl:
|
||||
policy_loss: reinforce
|
||||
divide_advantage_by_std: false
|
||||
kl_coef: 0.0
|
||||
final_kl_coef: 0.0
|
||||
entropy_bonus: 0.0
|
||||
reward_minus_kl_coef: 0.0
|
||||
epsilon_low: 0.02
|
||||
epsilon_high: 0.02
|
||||
use_advantages: true
|
||||
relu_log_p_weights: false
|
||||
clamp_log_ratio_ref_new_value: 5
|
||||
temperature: ${...llm.parameters.temperature}
|
||||
aggregate_loss: sum
|
||||
overlong_filtering: false
|
||||
adv_estimator: rloo
|
||||
filter_zero_advantage_groups: false
|
||||
rewards:
|
||||
correct_answer_finished: 1.0
|
||||
correct_answer_not_finished: 1.0
|
||||
wrong_answer_finished: 0
|
||||
wrong_answer_not_finished: 0
|
||||
no_answer_finished: 0
|
||||
no_answer_not_finished: 0
|
||||
unparsable_finished: 0
|
||||
unparsable_not_finished: 0
|
||||
streams:
|
||||
backend: files
|
||||
seed: 42
|
||||
fp32_lm_head: false
|
||||
fp32_layer_prefix: lm_head
|
||||
actor:
|
||||
log_each_n_secs: 0
|
||||
llm_max_rollouts: 256
|
||||
rollout_workers: 1
|
||||
discount_factor: 1
|
||||
problem_queue_size: 256
|
||||
result_queue_size: 256
|
||||
throughput_window_size: 50
|
||||
shared_memory_entry_size: 10000000
|
||||
rollout_policy: pipelinerl.domains.math.generate_math_rollout
|
||||
system_prompt: Please reason step by step, and put your final answer within \boxed{}.
|
||||
task_template: '{task}'
|
||||
task_prompt: ''
|
||||
environment: null
|
||||
preprocess:
|
||||
input: actor
|
||||
output: training_data
|
||||
n_workers: 8
|
||||
chunk_n_groups: 2
|
||||
raw_queue_size: 8
|
||||
input_queue_size: 32
|
||||
output_queue_size: 32
|
||||
dataset_buffer_size: 0
|
||||
ring_buffer_size: 128
|
||||
max_ready_samples_per_lead: 64
|
||||
pop_old_data: ${..pop_old_data}
|
||||
shared_memory_entry_size: 100000000
|
||||
log_every_n_samples: 128
|
||||
llm:
|
||||
parameters:
|
||||
max_tokens: 4096
|
||||
temperature: 1.0
|
||||
test_llm:
|
||||
parameters:
|
||||
max_tokens: 4096
|
||||
temperature: 1.0
|
||||
top_p: 0.95
|
||||
top_k: 50
|
||||
vllm_config:
|
||||
use_v1: false
|
||||
quantization: null
|
||||
vllm_kwargs:
|
||||
dtype: bfloat16
|
||||
gpu-memory-utilization: 0.92
|
||||
max-num-seqs: 64
|
||||
max-num-batched-tokens: 16384
|
||||
enable-chunked-prefill: ''
|
||||
return-tokens-as-token-ids: ''
|
||||
tensor-parallel-size: 1
|
||||
pipeline-parallel-size: 1
|
||||
generation-config: vllm
|
||||
max_model_len: 8192
|
||||
num-scheduler-steps: 8
|
||||
disable-log-requests: ''
|
||||
disable-frontend-multiprocessing: ''
|
||||
world:
|
||||
replicas: 1
|
||||
actor_fraction: 3
|
||||
preprocessor_fraction: 0
|
||||
finetune_fraction: 1
|
||||
env_replicas: 1
|
||||
actor_group_port: 9000
|
||||
environment_start_port: 7777
|
||||
jobs:
|
||||
- kind: actor_llm
|
||||
idx: 0
|
||||
replica_idx: 0
|
||||
local_idx: 0
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: 8080
|
||||
gpus:
|
||||
- 0
|
||||
url: http://localhost:8080
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
- kind: actor_llm
|
||||
idx: 1
|
||||
replica_idx: 1
|
||||
local_idx: 1
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: 8081
|
||||
gpus:
|
||||
- 1
|
||||
url: http://localhost:8081
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
- kind: actor_llm
|
||||
idx: 2
|
||||
replica_idx: 2
|
||||
local_idx: 2
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: 8082
|
||||
gpus:
|
||||
- 2
|
||||
url: http://localhost:8082
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
- kind: actor
|
||||
idx: 3
|
||||
replica_idx: 0
|
||||
local_idx: 0
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: null
|
||||
gpus: []
|
||||
url: ''
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
- kind: preprocessor
|
||||
idx: 4
|
||||
replica_idx: 0
|
||||
local_idx: 0
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: null
|
||||
gpus: []
|
||||
url: ''
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
- kind: environment
|
||||
idx: 5
|
||||
replica_idx: 0
|
||||
local_idx: 0
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: 7777
|
||||
gpus: []
|
||||
url: ''
|
||||
environment_key: math
|
||||
environment_index: 0
|
||||
- kind: finetune
|
||||
idx: 6
|
||||
replica_idx: 0
|
||||
local_idx: 0
|
||||
node_rank: 0
|
||||
hostname: localhost
|
||||
port: null
|
||||
gpus:
|
||||
- 3
|
||||
url: ''
|
||||
environment_key: null
|
||||
environment_index: null
|
||||
eval_every_n_versions: 78000
|
||||
model_path: Qwen/Qwen2.5-1.5B
|
||||
accelerate_config: null
|
||||
use_deepspeed: true
|
||||
deepspeed_config: deepspeed_stage3_bf16
|
||||
use_fsdp: false
|
||||
fsdp:
|
||||
param_dtype: fp32
|
||||
reduce_dtype: fp32
|
||||
buffer_dtype: fp32
|
||||
output_dir: results/qwen2.5_1.5b_rloo_no_kl_3a1f_4xh100_236657
|
||||
force_restart: false
|
||||
pop_old_data: true
|
||||
max_lag: null
|
||||
attempts: 16
|
||||
train_subset: null
|
||||
debug:
|
||||
mode: ''
|
||||
streams_from: null
|
||||
place_inference_workers: true
|
||||
use_existing_llms: false
|
||||
me:
|
||||
job_idx: null
|
||||
wandb:
|
||||
use_wandb: true
|
||||
fail_on_init_error: false
|
||||
init_timeout: 120
|
||||
wandb_id: null
|
||||
wandb_name: null
|
||||
wandb_entity_name: jaygala24-team
|
||||
wandb_project_name: rl-post-training
|
||||
wandb_resume: always
|
||||
wandb_use_basename: true
|
||||
wandb_workspace_root: results
|
||||
wandb_group: qwen2.5_1.5b_rloo_no_kl_3a1f_4xh100_236657
|
||||
wandb_dir: null
|
||||
tags: []
|
||||
environments:
|
||||
- key: math
|
||||
mode: remote
|
||||
_target_: pipelinerl.domains.math.MathEnvironment
|
||||
environment_key: math
|
||||
dataset_loader: pipelinerl.domains.math.load_datasets
|
||||
train_dataset_names:
|
||||
- gsm8k_train
|
||||
- math_train
|
||||
test_dataset_names:
|
||||
- gsm8k_test
|
||||
- math_500
|
||||
3
training_metrics.png
Normal file
3
training_metrics.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:38a6e5eaa8b862b4930ef074c6e3a8172e20a94fcbf7d6bb4fc17e786f04e130
|
||||
size 186370
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user