初始化项目,由ModelHub XC社区提供模型
Model: staeiou/bartleby-qwen3-1.7b_dpo Source: Original Platform
This commit is contained in:
134
logs.log
Normal file
134
logs.log
Normal file
@@ -0,0 +1,134 @@
|
||||
==========================================
|
||||
DPO From Existing Checkpoint
|
||||
==========================================
|
||||
Source: staeiou/bartleby-qwen3-1.7b_v5/
|
||||
Output: staeiou/bartleby-qwen3-1.7b_dpo
|
||||
DPO Data: data/training_data_dpo.jsonl
|
||||
Train: bs=2 grad_accum=16 lr=5e-7 epochs=1 beta=0.1
|
||||
|
||||
→ No local vLLM detected, proceeding with DPO
|
||||
→ Starting DPO-only fine-tuning...
|
||||
LD_LIBRARY_PATH="/opt/venv/lib/python3.10/site-packages/nvidia/cu13/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
|
||||
FULL_FINETUNING=1 \
|
||||
RUN_SFT=0 \
|
||||
BASE_MODEL=staeiou/bartleby-qwen3-1.7b_v5/ \
|
||||
MODEL_DIR=staeiou/bartleby-qwen3-1.7b_dpo \
|
||||
DPO_DATA=data/training_data_dpo.jsonl \
|
||||
MAX_SEQ_LENGTH=1024 \
|
||||
VAL_FRACTION=0.05 \
|
||||
DPO_BETA=0.1 \
|
||||
DPO_NUM_TRAIN_EPOCHS=1 \
|
||||
DPO_LEARNING_RATE=5e-7 \
|
||||
DPO_LR_SCHEDULER_TYPE=cosine \
|
||||
DPO_WARMUP_RATIO=0.03 \
|
||||
DPO_WEIGHT_DECAY=0.05 \
|
||||
DPO_MAX_GRAD_NORM=1.0 \
|
||||
DPO_PER_DEVICE_TRAIN_BATCH_SIZE=2 \
|
||||
DPO_GRADIENT_ACCUMULATION_STEPS=16 \
|
||||
DPO_EVAL_STEPS=100 \
|
||||
DPO_SAVE_STEPS=100 \
|
||||
DPO_LOGGING_STEPS=10 \
|
||||
DPO_MAX_LENGTH=1024 \
|
||||
DPO_MAX_PROMPT_LENGTH=512 \
|
||||
DPO_MAX_COMPLETION_LENGTH=512 \
|
||||
python finetune.py
|
||||
Skipping import of cpp extensions due to incompatible torch version 2.9.0+cu128 for torchao version 0.15.0+cu129 Please see https://github.com/pytorch/ao/issues/2919 for more info
|
||||
/workspace/bartleby-1b/finetune.py:129: UserWarning: WARNING: Unsloth should be imported before [trl, transformers, peft] to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.
|
||||
|
||||
Please restructure your imports with 'import unsloth' at the top of your file.
|
||||
from unsloth import FastLanguageModel
|
||||
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
|
||||
Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.
|
||||
🦥 Unsloth Zoo will now patch everything to make training faster!
|
||||
================================================================================
|
||||
BARTLEBY FULL FINETUNE — 16-BIT — AUTO TEMPLATE+MASK DETECT — LAST-ANSWER MULTITURN
|
||||
================================================================================
|
||||
MODEL : staeiou/bartleby-qwen3-1.7b_v5/
|
||||
DATA : data/training_data_v2_filtered.jsonl
|
||||
GOLD : data/gold_seed_training_data.jsonl
|
||||
SFT_OUTPUT : staeiou/bartleby-qwen3-1.7b_dpo
|
||||
OUTPUT : staeiou/bartleby-qwen3-1.7b_dpo
|
||||
CACHE_DIR : /workspace/.cache/huggingface/datasets
|
||||
SEQ : 1024
|
||||
PACKING : False
|
||||
LOAD_4BIT : False (forced 16-bit base)
|
||||
FULL_FT : True
|
||||
RUN_SFT : False
|
||||
REMOTE_CODE: False
|
||||
FAMILY : qwen
|
||||
TRL_COMPAT : ConstantLengthDataset patched=True
|
||||
TRL_DPO : mergekit_detection_patched=True
|
||||
TRL_DPO2 : llm_blender_detection_patched=True
|
||||
TRL_DPO3 : weave_detection_patched=True
|
||||
ADAPTERS : disabled
|
||||
TRAIN : bs=4 grad_accum=4 eff_bs=16
|
||||
EPOCHS : 4.0
|
||||
LR : 0.0002 scheduler=cosine warmup=0.05 weight_decay=0.01 max_grad_norm=1.0
|
||||
MULTITURN : num=0 max_turns=5 (only last assistant supervised)
|
||||
GOLD_REPEAT: 5
|
||||
DPO : enabled=True (using DPO dataset data/training_data_dpo.jsonl)
|
||||
DPO_TRAIN : bs=2 grad_accum=16 lr=5e-07 epochs=1.0 beta=0.1
|
||||
DPO_SEQ : max_length=1024 prompt=512 completion=512
|
||||
GPU : Single GPU (CUDA_VISIBLE_DEVICES=0)
|
||||
================================================================================
|
||||
|
||||
[1/1] Skipping SFT and preparing DPO-only run from existing checkpoint...
|
||||
⚠️ Qwen chat template surgery applied: disabled automatic <think> tag insertion
|
||||
|
||||
[8/9] Loading DPO dataset...
|
||||
Loaded DPO pairs: 45
|
||||
DPO split -> train=42 val=3
|
||||
|
||||
[9/9] Running DPO...
|
||||
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
|
||||
Loading policy model for DPO...
|
||||
==((====))== Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
|
||||
\\ /| NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
|
||||
O^O/ \_/ \ Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
|
||||
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
|
||||
"-____-" Free license: http://github.com/unslothai/unsloth
|
||||
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
|
||||
Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
|
||||
To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
|
||||
Loading weights: 100%|█████████████████████████| 310/310 [00:00<00:00, 1009.50it/s]
|
||||
Loading reference model for DPO...
|
||||
==((====))== Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
|
||||
\\ /| NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
|
||||
O^O/ \_/ \ Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
|
||||
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
|
||||
"-____-" Free license: http://github.com/unslothai/unsloth
|
||||
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
|
||||
Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
|
||||
To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
|
||||
Loading weights: 100%|██████████████████████████| 310/310 [00:00<00:00, 942.10it/s]
|
||||
[trl.trainer.dpo_trainer|WARNING]You passed `model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.
|
||||
[trl.trainer.dpo_trainer|WARNING]You passed `ref_model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `ref_model_init_kwargs` will be ignored.
|
||||
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
Extracting prompt in train dataset (num_proc=42): 100%|█| 42/42 [00:01<00:00, 25.91
|
||||
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
Applying chat template to train dataset (num_proc=42): 31%|▎| 13/42 [00:17<00:36, Applying chat template to train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00,
|
||||
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||||
Tokenizing train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00, 1.30s/ example
|
||||
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
Extracting prompt in eval dataset (num_proc=3): 100%|█| 3/3 [00:00<00:00, 7.59 exa
|
||||
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
Applying chat template to eval dataset (num_proc=3): 100%|█| 3/3 [00:04<00:00, 1.5
|
||||
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||||
Tokenizing eval dataset (num_proc=3): 100%|███| 3/3 [00:04<00:00, 1.49s/ examples]
|
||||
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
|
||||
\\ /| Num examples = 42 | Num Epochs = 1 | Total steps = 2
|
||||
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 16
|
||||
\ / Data Parallel GPUs = 1 | Total batch size (2 x 16 x 1) = 32
|
||||
"-____-" Trainable parameters = 1,720,574,976 of 1,720,574,976 (100.00% trained)
|
||||
Writing model shards: 100%|██████████████████████████| 1/1 [00:04<00:00, 4.71s/it]
|
||||
{'train_runtime': '41.2', 'train_samples_per_second': '1.019', 'train_steps_per_second': '0.049', 'train_loss': '0.6924', 'epoch': '1'}
|
||||
100%|████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.60s/it]
|
||||
Writing model shards: 100%|██████████████████████████| 1/1 [00:05<00:00, 5.85s/it]
|
||||
Done.
|
||||
✓ DPO-only fine-tuning complete!
|
||||
Reference in New Issue
Block a user