135 lines
8.2 KiB
Plaintext
135 lines
8.2 KiB
Plaintext
|
|
==========================================
|
||
|
|
DPO From Existing Checkpoint
|
||
|
|
==========================================
|
||
|
|
Source: staeiou/bartleby-qwen3-1.7b_v5/
|
||
|
|
Output: staeiou/bartleby-qwen3-1.7b_dpo
|
||
|
|
DPO Data: data/training_data_dpo.jsonl
|
||
|
|
Train: bs=2 grad_accum=16 lr=5e-7 epochs=1 beta=0.1
|
||
|
|
|
||
|
|
→ No local vLLM detected, proceeding with DPO
|
||
|
|
→ Starting DPO-only fine-tuning...
|
||
|
|
LD_LIBRARY_PATH="/opt/venv/lib/python3.10/site-packages/nvidia/cu13/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
|
||
|
|
FULL_FINETUNING=1 \
|
||
|
|
RUN_SFT=0 \
|
||
|
|
BASE_MODEL=staeiou/bartleby-qwen3-1.7b_v5/ \
|
||
|
|
MODEL_DIR=staeiou/bartleby-qwen3-1.7b_dpo \
|
||
|
|
DPO_DATA=data/training_data_dpo.jsonl \
|
||
|
|
MAX_SEQ_LENGTH=1024 \
|
||
|
|
VAL_FRACTION=0.05 \
|
||
|
|
DPO_BETA=0.1 \
|
||
|
|
DPO_NUM_TRAIN_EPOCHS=1 \
|
||
|
|
DPO_LEARNING_RATE=5e-7 \
|
||
|
|
DPO_LR_SCHEDULER_TYPE=cosine \
|
||
|
|
DPO_WARMUP_RATIO=0.03 \
|
||
|
|
DPO_WEIGHT_DECAY=0.05 \
|
||
|
|
DPO_MAX_GRAD_NORM=1.0 \
|
||
|
|
DPO_PER_DEVICE_TRAIN_BATCH_SIZE=2 \
|
||
|
|
DPO_GRADIENT_ACCUMULATION_STEPS=16 \
|
||
|
|
DPO_EVAL_STEPS=100 \
|
||
|
|
DPO_SAVE_STEPS=100 \
|
||
|
|
DPO_LOGGING_STEPS=10 \
|
||
|
|
DPO_MAX_LENGTH=1024 \
|
||
|
|
DPO_MAX_PROMPT_LENGTH=512 \
|
||
|
|
DPO_MAX_COMPLETION_LENGTH=512 \
|
||
|
|
python finetune.py
|
||
|
|
Skipping import of cpp extensions due to incompatible torch version 2.9.0+cu128 for torchao version 0.15.0+cu129 Please see https://github.com/pytorch/ao/issues/2919 for more info
|
||
|
|
/workspace/bartleby-1b/finetune.py:129: UserWarning: WARNING: Unsloth should be imported before [trl, transformers, peft] to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.
|
||
|
|
|
||
|
|
Please restructure your imports with 'import unsloth' at the top of your file.
|
||
|
|
from unsloth import FastLanguageModel
|
||
|
|
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
|
||
|
|
Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.
|
||
|
|
🦥 Unsloth Zoo will now patch everything to make training faster!
|
||
|
|
================================================================================
|
||
|
|
BARTLEBY FULL FINETUNE — 16-BIT — AUTO TEMPLATE+MASK DETECT — LAST-ANSWER MULTITURN
|
||
|
|
================================================================================
|
||
|
|
MODEL : staeiou/bartleby-qwen3-1.7b_v5/
|
||
|
|
DATA : data/training_data_v2_filtered.jsonl
|
||
|
|
GOLD : data/gold_seed_training_data.jsonl
|
||
|
|
SFT_OUTPUT : staeiou/bartleby-qwen3-1.7b_dpo
|
||
|
|
OUTPUT : staeiou/bartleby-qwen3-1.7b_dpo
|
||
|
|
CACHE_DIR : /workspace/.cache/huggingface/datasets
|
||
|
|
SEQ : 1024
|
||
|
|
PACKING : False
|
||
|
|
LOAD_4BIT : False (forced 16-bit base)
|
||
|
|
FULL_FT : True
|
||
|
|
RUN_SFT : False
|
||
|
|
REMOTE_CODE: False
|
||
|
|
FAMILY : qwen
|
||
|
|
TRL_COMPAT : ConstantLengthDataset patched=True
|
||
|
|
TRL_DPO : mergekit_detection_patched=True
|
||
|
|
TRL_DPO2 : llm_blender_detection_patched=True
|
||
|
|
TRL_DPO3 : weave_detection_patched=True
|
||
|
|
ADAPTERS : disabled
|
||
|
|
TRAIN : bs=4 grad_accum=4 eff_bs=16
|
||
|
|
EPOCHS : 4.0
|
||
|
|
LR : 0.0002 scheduler=cosine warmup=0.05 weight_decay=0.01 max_grad_norm=1.0
|
||
|
|
MULTITURN : num=0 max_turns=5 (only last assistant supervised)
|
||
|
|
GOLD_REPEAT: 5
|
||
|
|
DPO : enabled=True (using DPO dataset data/training_data_dpo.jsonl)
|
||
|
|
DPO_TRAIN : bs=2 grad_accum=16 lr=5e-07 epochs=1.0 beta=0.1
|
||
|
|
DPO_SEQ : max_length=1024 prompt=512 completion=512
|
||
|
|
GPU : Single GPU (CUDA_VISIBLE_DEVICES=0)
|
||
|
|
================================================================================
|
||
|
|
|
||
|
|
[1/1] Skipping SFT and preparing DPO-only run from existing checkpoint...
|
||
|
|
⚠️ Qwen chat template surgery applied: disabled automatic <think> tag insertion
|
||
|
|
|
||
|
|
[8/9] Loading DPO dataset...
|
||
|
|
Loaded DPO pairs: 45
|
||
|
|
DPO split -> train=42 val=3
|
||
|
|
|
||
|
|
[9/9] Running DPO...
|
||
|
|
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
|
||
|
|
Loading policy model for DPO...
|
||
|
|
==((====))== Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
|
||
|
|
\\ /| NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
|
||
|
|
O^O/ \_/ \ Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
|
||
|
|
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
|
||
|
|
"-____-" Free license: http://github.com/unslothai/unsloth
|
||
|
|
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
|
||
|
|
Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
|
||
|
|
To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
|
||
|
|
Loading weights: 100%|█████████████████████████| 310/310 [00:00<00:00, 1009.50it/s]
|
||
|
|
Loading reference model for DPO...
|
||
|
|
==((====))== Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
|
||
|
|
\\ /| NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
|
||
|
|
O^O/ \_/ \ Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
|
||
|
|
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
|
||
|
|
"-____-" Free license: http://github.com/unslothai/unsloth
|
||
|
|
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
|
||
|
|
Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
|
||
|
|
To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
|
||
|
|
Loading weights: 100%|██████████████████████████| 310/310 [00:00<00:00, 942.10it/s]
|
||
|
|
[trl.trainer.dpo_trainer|WARNING]You passed `model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.
|
||
|
|
[trl.trainer.dpo_trainer|WARNING]You passed `ref_model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `ref_model_init_kwargs` will be ignored.
|
||
|
|
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
Extracting prompt in train dataset (num_proc=42): 100%|█| 42/42 [00:01<00:00, 25.91
|
||
|
|
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
Applying chat template to train dataset (num_proc=42): 31%|▎| 13/42 [00:17<00:36, Applying chat template to train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00,
|
||
|
|
num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
|
||
|
|
Tokenizing train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00, 1.30s/ example
|
||
|
|
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
Extracting prompt in eval dataset (num_proc=3): 100%|█| 3/3 [00:00<00:00, 7.59 exa
|
||
|
|
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
Applying chat template to eval dataset (num_proc=3): 100%|█| 3/3 [00:04<00:00, 1.5
|
||
|
|
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
|
||
|
|
Tokenizing eval dataset (num_proc=3): 100%|███| 3/3 [00:04<00:00, 1.49s/ examples]
|
||
|
|
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
|
||
|
|
\\ /| Num examples = 42 | Num Epochs = 1 | Total steps = 2
|
||
|
|
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 16
|
||
|
|
\ / Data Parallel GPUs = 1 | Total batch size (2 x 16 x 1) = 32
|
||
|
|
"-____-" Trainable parameters = 1,720,574,976 of 1,720,574,976 (100.00% trained)
|
||
|
|
Writing model shards: 100%|██████████████████████████| 1/1 [00:04<00:00, 4.71s/it]
|
||
|
|
{'train_runtime': '41.2', 'train_samples_per_second': '1.019', 'train_steps_per_second': '0.049', 'train_loss': '0.6924', 'epoch': '1'}
|
||
|
|
100%|████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.60s/it]
|
||
|
|
Writing model shards: 100%|██████████████████████████| 1/1 [00:05<00:00, 5.85s/it]
|
||
|
|
Done.
|
||
|
|
✓ DPO-only fine-tuning complete!
|