初始化项目，由ModelHub XC社区提供模型

Model: staeiou/bartleby-qwen3-1.7b_dpo Source: Original Platform
2026-04-29 12:28:49 +08:00
commit c45c81cccf
11 changed files with 395 additions and 0 deletions
--- a/logs.log
+++ b/logs.log
@@ -0,0 +1,134 @@
+==========================================
+DPO From Existing Checkpoint
+==========================================
+Source: staeiou/bartleby-qwen3-1.7b_v5/
+Output: staeiou/bartleby-qwen3-1.7b_dpo
+DPO Data: data/training_data_dpo.jsonl
+Train: bs=2 grad_accum=16 lr=5e-7 epochs=1 beta=0.1
+
+→ No local vLLM detected, proceeding with DPO
+→ Starting DPO-only fine-tuning...
+LD_LIBRARY_PATH="/opt/venv/lib/python3.10/site-packages/nvidia/cu13/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
+FULL_FINETUNING=1 \
+RUN_SFT=0 \
+BASE_MODEL=staeiou/bartleby-qwen3-1.7b_v5/ \
+MODEL_DIR=staeiou/bartleby-qwen3-1.7b_dpo \
+DPO_DATA=data/training_data_dpo.jsonl \
+MAX_SEQ_LENGTH=1024 \
+VAL_FRACTION=0.05 \
+DPO_BETA=0.1 \
+DPO_NUM_TRAIN_EPOCHS=1 \
+DPO_LEARNING_RATE=5e-7 \
+DPO_LR_SCHEDULER_TYPE=cosine \
+DPO_WARMUP_RATIO=0.03 \
+DPO_WEIGHT_DECAY=0.05 \
+DPO_MAX_GRAD_NORM=1.0 \
+DPO_PER_DEVICE_TRAIN_BATCH_SIZE=2 \
+DPO_GRADIENT_ACCUMULATION_STEPS=16 \
+DPO_EVAL_STEPS=100 \
+DPO_SAVE_STEPS=100 \
+DPO_LOGGING_STEPS=10 \
+DPO_MAX_LENGTH=1024 \
+DPO_MAX_PROMPT_LENGTH=512 \
+DPO_MAX_COMPLETION_LENGTH=512 \
+python finetune.py
+Skipping import of cpp extensions due to incompatible torch version 2.9.0+cu128 for torchao version 0.15.0+cu129             Please see https://github.com/pytorch/ao/issues/2919 for more info
+/workspace/bartleby-1b/finetune.py:129: UserWarning: WARNING: Unsloth should be imported before [trl, transformers, peft] to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.
+
+Please restructure your imports with 'import unsloth' at the top of your file.
+  from unsloth import FastLanguageModel
+🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
+Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.
+🦥 Unsloth Zoo will now patch everything to make training faster!
+================================================================================
+BARTLEBY FULL FINETUNE — 16-BIT — AUTO TEMPLATE+MASK DETECT — LAST-ANSWER MULTITURN
+================================================================================
+MODEL      : staeiou/bartleby-qwen3-1.7b_v5/
+DATA       : data/training_data_v2_filtered.jsonl
+GOLD       : data/gold_seed_training_data.jsonl
+SFT_OUTPUT : staeiou/bartleby-qwen3-1.7b_dpo
+OUTPUT     : staeiou/bartleby-qwen3-1.7b_dpo
+CACHE_DIR  : /workspace/.cache/huggingface/datasets
+SEQ        : 1024
+PACKING    : False
+LOAD_4BIT  : False (forced 16-bit base)
+FULL_FT    : True
+RUN_SFT    : False
+REMOTE_CODE: False
+FAMILY     : qwen
+TRL_COMPAT : ConstantLengthDataset patched=True
+TRL_DPO    : mergekit_detection_patched=True
+TRL_DPO2   : llm_blender_detection_patched=True
+TRL_DPO3   : weave_detection_patched=True
+ADAPTERS   : disabled
+TRAIN      : bs=4 grad_accum=4 eff_bs=16
+EPOCHS     : 4.0
+LR         : 0.0002 scheduler=cosine warmup=0.05 weight_decay=0.01 max_grad_norm=1.0
+MULTITURN  : num=0 max_turns=5 (only last assistant supervised)
+GOLD_REPEAT: 5
+DPO        : enabled=True (using DPO dataset data/training_data_dpo.jsonl)
+DPO_TRAIN  : bs=2 grad_accum=16 lr=5e-07 epochs=1.0 beta=0.1
+DPO_SEQ    : max_length=1024 prompt=512 completion=512
+GPU        : Single GPU (CUDA_VISIBLE_DEVICES=0)
+================================================================================
+
+[1/1] Skipping SFT and preparing DPO-only run from existing checkpoint...
+⚠️  Qwen chat template surgery applied: disabled automatic <think> tag insertion
+
+[8/9] Loading DPO dataset...
+Loaded DPO pairs: 45
+DPO split -> train=42 val=3
+
+[9/9] Running DPO...
+warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
+Loading policy model for DPO...
+==((====))==  Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
+   \\   /|    NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
+O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
+\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
+ "-____-"     Free license: http://github.com/unslothai/unsloth
+Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
+Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
+To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
+Loading weights: 100%|█████████████████████████| 310/310 [00:00<00:00, 1009.50it/s]
+Loading reference model for DPO...
+==((====))==  Unsloth 2026.3.5: Fast Qwen3 patching. Transformers: 5.3.0. vLLM: 0.13.0.
+   \\   /|    NVIDIA RTX 5000 Ada Generation. Num GPUs = 1. Max memory: 31.475 GB. Platform: Linux.
+O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.5.0
+\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]
+ "-____-"     Free license: http://github.com/unslothai/unsloth
+Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
+Unsloth: Using bfloat16 full finetuning which cuts memory usage by 50%.
+To enable float32 training, use `float32_mixed_precision = True` during FastLanguageModel.from_pretrained
+Loading weights: 100%|██████████████████████████| 310/310 [00:00<00:00, 942.10it/s]
+[trl.trainer.dpo_trainer|WARNING]You passed `model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `model_init_kwargs` will be ignored.
+[trl.trainer.dpo_trainer|WARNING]You passed `ref_model_init_kwargs` to the `DPOConfig`, but your model is already instantiated. The `ref_model_init_kwargs` will be ignored.
+num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+Extracting prompt in train dataset (num_proc=42): 100%|█| 42/42 [00:01<00:00, 25.91
+num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+Applying chat template to train dataset (num_proc=42):  31%|▎| 13/42 [00:17<00:36, Applying chat template to train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00, 
+num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 42. Reducing num_proc to 42 for dataset of size 42.
+Tokenizing train dataset (num_proc=42): 100%|█| 42/42 [00:54<00:00,  1.30s/ example
+num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+Extracting prompt in eval dataset (num_proc=3): 100%|█| 3/3 [00:00<00:00,  7.59 exa
+num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+Applying chat template to eval dataset (num_proc=3): 100%|█| 3/3 [00:04<00:00,  1.5
+num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+[datasets.arrow_dataset|WARNING]num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
+Tokenizing eval dataset (num_proc=3): 100%|███| 3/3 [00:04<00:00,  1.49s/ examples]
+==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
+   \\   /|    Num examples = 42 | Num Epochs = 1 | Total steps = 2
+O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 16
+\        /    Data Parallel GPUs = 1 | Total batch size (2 x 16 x 1) = 32
+ "-____-"     Trainable parameters = 1,720,574,976 of 1,720,574,976 (100.00% trained)
+Writing model shards: 100%|██████████████████████████| 1/1 [00:04<00:00,  4.71s/it]
+{'train_runtime': '41.2', 'train_samples_per_second': '1.019', 'train_steps_per_second': '0.049', 'train_loss': '0.6924', 'epoch': '1'}
+100%|████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.60s/it]
+Writing model shards: 100%|██████████████████████████| 1/1 [00:05<00:00,  5.85s/it]
+Done.
+✓ DPO-only fine-tuning complete!