初始化项目,由ModelHub XC社区提供模型
Model: donoway/BoolQ_Llama-3.2-1B-26t8ytsb Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
checkpoint-15/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
114
README.md
Normal file
114
README.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
---
|
||||||
|
library_name: transformers
|
||||||
|
license: llama3.2
|
||||||
|
base_model: meta-llama/Llama-3.2-1B
|
||||||
|
tags:
|
||||||
|
- generated_from_trainer
|
||||||
|
metrics:
|
||||||
|
- accuracy
|
||||||
|
model-index:
|
||||||
|
- name: BoolQ_Llama-3.2-1B-26t8ytsb
|
||||||
|
results: []
|
||||||
|
---
|
||||||
|
|
||||||
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
||||||
|
should probably proofread and complete it, then remove this comment. -->
|
||||||
|
|
||||||
|
# BoolQ_Llama-3.2-1B-26t8ytsb
|
||||||
|
|
||||||
|
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
|
||||||
|
It achieves the following results on the evaluation set:
|
||||||
|
- Loss: 1.6420
|
||||||
|
- Model Preparation Time: 0.0059
|
||||||
|
- Mdl: 7746.5496
|
||||||
|
- Accumulated Loss: 5369.4990
|
||||||
|
- Correct Preds: 2337.0
|
||||||
|
- Total Preds: 3270.0
|
||||||
|
- Accuracy: 0.7147
|
||||||
|
- Correct Gen Preds: 2302.0
|
||||||
|
- Gen Accuracy: 0.7040
|
||||||
|
- Correct Gen Preds 9642: 1525.0
|
||||||
|
- Correct Preds 9642: 1556.0
|
||||||
|
- Total Labels 9642: 2026.0
|
||||||
|
- Accuracy 9642: 0.7680
|
||||||
|
- Gen Accuracy 9642: 0.7527
|
||||||
|
- Correct Gen Preds 2822: 768.0
|
||||||
|
- Correct Preds 2822: 781.0
|
||||||
|
- Total Labels 2822: 1231.0
|
||||||
|
- Accuracy 2822: 0.6344
|
||||||
|
- Gen Accuracy 2822: 0.6239
|
||||||
|
|
||||||
|
## Model description
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Intended uses & limitations
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Training and evaluation data
|
||||||
|
|
||||||
|
More information needed
|
||||||
|
|
||||||
|
## Training procedure
|
||||||
|
|
||||||
|
### Training hyperparameters
|
||||||
|
|
||||||
|
The following hyperparameters were used during training:
|
||||||
|
- learning_rate: 2e-05
|
||||||
|
- train_batch_size: 32
|
||||||
|
- eval_batch_size: 120
|
||||||
|
- seed: 42
|
||||||
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
||||||
|
- lr_scheduler_type: cosine
|
||||||
|
- lr_scheduler_warmup_ratio: 0.01
|
||||||
|
- num_epochs: 100
|
||||||
|
|
||||||
|
### Training results
|
||||||
|
|
||||||
|
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 9642 | Correct Preds 9642 | Total Labels 9642 | Accuracy 9642 | Gen Accuracy 9642 | Correct Gen Preds 2822 | Correct Preds 2822 | Total Labels 2822 | Accuracy 2822 | Gen Accuracy 2822 |
|
||||||
|
|:-------------:|:-----:|:----:|:---------------:|:----------------------:|:----------:|:----------------:|:-------------:|:-----------:|:--------:|:-----------------:|:------------:|:----------------------:|:------------------:|:-----------------:|:-------------:|:-----------------:|:----------------------:|:------------------:|:-----------------:|:-------------:|:-----------------:|
|
||||||
|
| No log | 0 | 0 | 0.7080 | 0.0059 | 3339.8933 | 2315.0376 | 2032.0 | 3270.0 | 0.6214 | 2040.0 | 0.6239 | 2007.0 | 2008.0 | 2026.0 | 0.9911 | 0.9906 | 24.0 | 24.0 | 1231.0 | 0.0195 | 0.0195 |
|
||||||
|
| 0.5194 | 1.0 | 3 | 0.7779 | 0.0059 | 3669.7865 | 2543.7022 | 1857.0 | 3270.0 | 0.5679 | 1701.0 | 0.5202 | 670.0 | 760.0 | 2026.0 | 0.3751 | 0.3307 | 1022.0 | 1097.0 | 1231.0 | 0.8911 | 0.8302 |
|
||||||
|
| 0.2429 | 2.0 | 6 | 0.8811 | 0.0059 | 4156.8448 | 2881.3053 | 2088.0 | 3270.0 | 0.6385 | 1960.0 | 0.5994 | 1723.0 | 1837.0 | 2026.0 | 0.9067 | 0.8504 | 228.0 | 251.0 | 1231.0 | 0.2039 | 0.1852 |
|
||||||
|
| 0.0895 | 3.0 | 9 | 0.7633 | 0.0059 | 3601.1554 | 2496.1307 | 2252.0 | 3270.0 | 0.6887 | 2036.0 | 0.6226 | 1357.0 | 1527.0 | 2026.0 | 0.7537 | 0.6698 | 670.0 | 725.0 | 1231.0 | 0.5890 | 0.5443 |
|
||||||
|
| 0.3695 | 4.0 | 12 | 2.0120 | 0.0059 | 9491.7849 | 6579.2040 | 2288.0 | 3270.0 | 0.6997 | 2274.0 | 0.6954 | 1905.0 | 1917.0 | 2026.0 | 0.9462 | 0.9403 | 360.0 | 371.0 | 1231.0 | 0.3014 | 0.2924 |
|
||||||
|
| 0.0001 | 5.0 | 15 | 1.6420 | 0.0059 | 7746.5496 | 5369.4990 | 2337.0 | 3270.0 | 0.7147 | 2302.0 | 0.7040 | 1525.0 | 1556.0 | 2026.0 | 0.7680 | 0.7527 | 768.0 | 781.0 | 1231.0 | 0.6344 | 0.6239 |
|
||||||
|
| 0.0 | 6.0 | 18 | 1.8149 | 0.0059 | 8562.1293 | 5934.8158 | 2298.0 | 3270.0 | 0.7028 | 2231.0 | 0.6823 | 1282.0 | 1340.0 | 2026.0 | 0.6614 | 0.6328 | 940.0 | 958.0 | 1231.0 | 0.7782 | 0.7636 |
|
||||||
|
| 0.0006 | 7.0 | 21 | 1.9537 | 0.0059 | 9216.6870 | 6388.5206 | 2267.0 | 3270.0 | 0.6933 | 2202.0 | 0.6734 | 1241.0 | 1299.0 | 2026.0 | 0.6412 | 0.6125 | 952.0 | 968.0 | 1231.0 | 0.7864 | 0.7734 |
|
||||||
|
| 0.0 | 8.0 | 24 | 2.0340 | 0.0059 | 9595.7529 | 6651.2691 | 2302.0 | 3270.0 | 0.7040 | 2246.0 | 0.6869 | 1366.0 | 1414.0 | 2026.0 | 0.6979 | 0.6742 | 871.0 | 888.0 | 1231.0 | 0.7214 | 0.7076 |
|
||||||
|
| 0.0 | 9.0 | 27 | 2.1642 | 0.0059 | 10209.8590 | 7076.9350 | 2299.0 | 3270.0 | 0.7031 | 2241.0 | 0.6853 | 1413.0 | 1464.0 | 2026.0 | 0.7226 | 0.6974 | 819.0 | 835.0 | 1231.0 | 0.6783 | 0.6653 |
|
||||||
|
| 0.0 | 10.0 | 30 | 2.2394 | 0.0059 | 10564.7680 | 7322.9392 | 2289.0 | 3270.0 | 0.7 | 2231.0 | 0.6823 | 1436.0 | 1485.0 | 2026.0 | 0.7330 | 0.7088 | 786.0 | 804.0 | 1231.0 | 0.6531 | 0.6385 |
|
||||||
|
| 0.0 | 11.0 | 33 | 2.2853 | 0.0059 | 10781.2871 | 7473.0188 | 2289.0 | 3270.0 | 0.7 | 2239.0 | 0.6847 | 1459.0 | 1500.0 | 2026.0 | 0.7404 | 0.7201 | 771.0 | 789.0 | 1231.0 | 0.6409 | 0.6263 |
|
||||||
|
| 0.0 | 12.0 | 36 | 2.3170 | 0.0059 | 10930.7882 | 7576.6450 | 2293.0 | 3270.0 | 0.7012 | 2249.0 | 0.6878 | 1474.0 | 1513.0 | 2026.0 | 0.7468 | 0.7275 | 766.0 | 780.0 | 1231.0 | 0.6336 | 0.6223 |
|
||||||
|
| 0.0 | 13.0 | 39 | 2.3358 | 0.0059 | 11019.6269 | 7638.2233 | 2291.0 | 3270.0 | 0.7006 | 2250.0 | 0.6881 | 1480.0 | 1519.0 | 2026.0 | 0.7498 | 0.7305 | 761.0 | 772.0 | 1231.0 | 0.6271 | 0.6182 |
|
||||||
|
| 0.0 | 14.0 | 42 | 2.3477 | 0.0059 | 11075.6473 | 7677.0537 | 2288.0 | 3270.0 | 0.6997 | 2249.0 | 0.6878 | 1484.0 | 1521.0 | 2026.0 | 0.7507 | 0.7325 | 756.0 | 767.0 | 1231.0 | 0.6231 | 0.6141 |
|
||||||
|
| 0.0 | 15.0 | 45 | 2.3567 | 0.0059 | 11118.0345 | 7706.4343 | 2285.0 | 3270.0 | 0.6988 | 2248.0 | 0.6875 | 1483.0 | 1517.0 | 2026.0 | 0.7488 | 0.7320 | 756.0 | 768.0 | 1231.0 | 0.6239 | 0.6141 |
|
||||||
|
| 0.0 | 16.0 | 48 | 2.3619 | 0.0059 | 11142.3851 | 7723.3128 | 2282.0 | 3270.0 | 0.6979 | 2248.0 | 0.6875 | 1483.0 | 1517.0 | 2026.0 | 0.7488 | 0.7320 | 756.0 | 765.0 | 1231.0 | 0.6214 | 0.6141 |
|
||||||
|
| 0.0 | 17.0 | 51 | 2.3645 | 0.0059 | 11154.6211 | 7731.7942 | 2292.0 | 3270.0 | 0.7009 | 2256.0 | 0.6899 | 1489.0 | 1524.0 | 2026.0 | 0.7522 | 0.7349 | 758.0 | 768.0 | 1231.0 | 0.6239 | 0.6158 |
|
||||||
|
| 0.0 | 18.0 | 54 | 2.3710 | 0.0059 | 11185.5857 | 7753.2572 | 2283.0 | 3270.0 | 0.6982 | 2251.0 | 0.6884 | 1485.0 | 1517.0 | 2026.0 | 0.7488 | 0.7330 | 757.0 | 766.0 | 1231.0 | 0.6223 | 0.6149 |
|
||||||
|
| 0.0 | 19.0 | 57 | 2.3719 | 0.0059 | 11189.8794 | 7756.2333 | 2285.0 | 3270.0 | 0.6988 | 2252.0 | 0.6887 | 1488.0 | 1520.0 | 2026.0 | 0.7502 | 0.7345 | 755.0 | 765.0 | 1231.0 | 0.6214 | 0.6133 |
|
||||||
|
| 0.0 | 20.0 | 60 | 2.3739 | 0.0059 | 11199.2181 | 7762.7064 | 2287.0 | 3270.0 | 0.6994 | 2255.0 | 0.6896 | 1489.0 | 1520.0 | 2026.0 | 0.7502 | 0.7349 | 757.0 | 767.0 | 1231.0 | 0.6231 | 0.6149 |
|
||||||
|
| 0.0 | 21.0 | 63 | 2.3731 | 0.0059 | 11195.3841 | 7760.0489 | 2287.0 | 3270.0 | 0.6994 | 2255.0 | 0.6896 | 1491.0 | 1521.0 | 2026.0 | 0.7507 | 0.7359 | 755.0 | 766.0 | 1231.0 | 0.6223 | 0.6133 |
|
||||||
|
| 0.0 | 22.0 | 66 | 2.3758 | 0.0059 | 11208.0963 | 7768.8604 | 2285.0 | 3270.0 | 0.6988 | 2258.0 | 0.6905 | 1494.0 | 1522.0 | 2026.0 | 0.7512 | 0.7374 | 756.0 | 763.0 | 1231.0 | 0.6198 | 0.6141 |
|
||||||
|
| 0.0 | 23.0 | 69 | 2.3778 | 0.0059 | 11217.4939 | 7775.3743 | 2284.0 | 3270.0 | 0.6985 | 2255.0 | 0.6896 | 1493.0 | 1521.0 | 2026.0 | 0.7507 | 0.7369 | 753.0 | 763.0 | 1231.0 | 0.6198 | 0.6117 |
|
||||||
|
| 0.0 | 24.0 | 72 | 2.3792 | 0.0059 | 11224.3777 | 7780.1458 | 2289.0 | 3270.0 | 0.7 | 2258.0 | 0.6905 | 1491.0 | 1522.0 | 2026.0 | 0.7512 | 0.7359 | 758.0 | 767.0 | 1231.0 | 0.6231 | 0.6158 |
|
||||||
|
| 0.0 | 25.0 | 75 | 2.3799 | 0.0059 | 11227.6572 | 7782.4189 | 2290.0 | 3270.0 | 0.7003 | 2260.0 | 0.6911 | 1493.0 | 1522.0 | 2026.0 | 0.7512 | 0.7369 | 759.0 | 768.0 | 1231.0 | 0.6239 | 0.6166 |
|
||||||
|
| 0.0 | 26.0 | 78 | 2.3831 | 0.0059 | 11242.5989 | 7792.7757 | 2283.0 | 3270.0 | 0.6982 | 2251.0 | 0.6884 | 1488.0 | 1520.0 | 2026.0 | 0.7502 | 0.7345 | 754.0 | 763.0 | 1231.0 | 0.6198 | 0.6125 |
|
||||||
|
| 0.0 | 27.0 | 81 | 2.3824 | 0.0059 | 11239.3411 | 7790.5176 | 2287.0 | 3270.0 | 0.6994 | 2259.0 | 0.6908 | 1492.0 | 1520.0 | 2026.0 | 0.7502 | 0.7364 | 758.0 | 767.0 | 1231.0 | 0.6231 | 0.6158 |
|
||||||
|
| 0.0 | 28.0 | 84 | 2.3854 | 0.0059 | 11253.1701 | 7800.1031 | 2288.0 | 3270.0 | 0.6997 | 2258.0 | 0.6905 | 1494.0 | 1522.0 | 2026.0 | 0.7512 | 0.7374 | 756.0 | 766.0 | 1231.0 | 0.6223 | 0.6141 |
|
||||||
|
| 0.0 | 29.0 | 87 | 2.3858 | 0.0059 | 11255.3105 | 7801.5867 | 2292.0 | 3270.0 | 0.7009 | 2261.0 | 0.6914 | 1496.0 | 1526.0 | 2026.0 | 0.7532 | 0.7384 | 756.0 | 766.0 | 1231.0 | 0.6223 | 0.6141 |
|
||||||
|
| 0.0 | 30.0 | 90 | 2.3892 | 0.0059 | 11271.3917 | 7812.7334 | 2285.0 | 3270.0 | 0.6988 | 2260.0 | 0.6911 | 1494.0 | 1520.0 | 2026.0 | 0.7502 | 0.7374 | 757.0 | 765.0 | 1231.0 | 0.6214 | 0.6149 |
|
||||||
|
| 0.0 | 31.0 | 93 | 2.3900 | 0.0059 | 11274.9614 | 7815.2077 | 2287.0 | 3270.0 | 0.6994 | 2259.0 | 0.6908 | 1493.0 | 1521.0 | 2026.0 | 0.7507 | 0.7369 | 757.0 | 766.0 | 1231.0 | 0.6223 | 0.6149 |
|
||||||
|
| 0.0 | 32.0 | 96 | 2.3922 | 0.0059 | 11285.5291 | 7822.5327 | 2285.0 | 3270.0 | 0.6988 | 2256.0 | 0.6899 | 1492.0 | 1520.0 | 2026.0 | 0.7502 | 0.7364 | 755.0 | 765.0 | 1231.0 | 0.6214 | 0.6133 |
|
||||||
|
| 0.0 | 33.0 | 99 | 2.3909 | 0.0059 | 11279.2140 | 7818.1554 | 2287.0 | 3270.0 | 0.6994 | 2262.0 | 0.6917 | 1493.0 | 1518.0 | 2026.0 | 0.7493 | 0.7369 | 760.0 | 769.0 | 1231.0 | 0.6247 | 0.6174 |
|
||||||
|
| 0.0 | 34.0 | 102 | 2.3927 | 0.0059 | 11287.6169 | 7823.9798 | 2283.0 | 3270.0 | 0.6982 | 2259.0 | 0.6908 | 1495.0 | 1519.0 | 2026.0 | 0.7498 | 0.7379 | 755.0 | 764.0 | 1231.0 | 0.6206 | 0.6133 |
|
||||||
|
| 0.0 | 35.0 | 105 | 2.3930 | 0.0059 | 11289.1913 | 7825.0711 | 2285.0 | 3270.0 | 0.6988 | 2258.0 | 0.6905 | 1494.0 | 1523.0 | 2026.0 | 0.7517 | 0.7374 | 755.0 | 762.0 | 1231.0 | 0.6190 | 0.6133 |
|
||||||
|
|
||||||
|
|
||||||
|
### Framework versions
|
||||||
|
|
||||||
|
- Transformers 4.51.3
|
||||||
|
- Pytorch 2.6.0+cu124
|
||||||
|
- Datasets 3.5.0
|
||||||
|
- Tokenizers 0.21.1
|
||||||
36
checkpoint-15/config.json
Normal file
36
checkpoint-15/config.json
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"head_dim": 64,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 8192,
|
||||||
|
"max_position_embeddings": 131072,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 16,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pad_token_id": 128004,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": {
|
||||||
|
"factor": 32.0,
|
||||||
|
"high_freq_factor": 4.0,
|
||||||
|
"low_freq_factor": 1.0,
|
||||||
|
"original_max_position_embeddings": 8192,
|
||||||
|
"rope_type": "llama3"
|
||||||
|
},
|
||||||
|
"rope_theta": 500000.0,
|
||||||
|
"tie_word_embeddings": true,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.51.3",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
9
checkpoint-15/generation_config.json
Normal file
9
checkpoint-15/generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"transformers_version": "4.51.3"
|
||||||
|
}
|
||||||
3
checkpoint-15/model.safetensors
Normal file
3
checkpoint-15/model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:611518c4b19e36fefbf1d07e47fb013f8be2cc0cb7df8023331ef1456a8114a2
|
||||||
|
size 2471645608
|
||||||
3
checkpoint-15/optimizer.pt
Normal file
3
checkpoint-15/optimizer.pt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:fc195f36a2d99c208faf5d4238c7784d6923e86f7fbd434c4de944cefb26382c
|
||||||
|
size 4943382114
|
||||||
3
checkpoint-15/rng_state.pth
Normal file
3
checkpoint-15/rng_state.pth
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d2b5691046896d865f067a1958689168fc2411c74d2f82d596bd6a636b2b141b
|
||||||
|
size 14244
|
||||||
3
checkpoint-15/scheduler.pt
Normal file
3
checkpoint-15/scheduler.pt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a1b526f1fa96b4a7f90f808310f90fb63088f580f0b7e9eab308f8e384c685fe
|
||||||
|
size 1064
|
||||||
23
checkpoint-15/special_tokens_map.json
Normal file
23
checkpoint-15/special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|begin_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|end_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|finetune_right_pad_id|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
checkpoint-15/tokenizer.json
(Stored with Git LFS)
Normal file
BIN
checkpoint-15/tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
2063
checkpoint-15/tokenizer_config.json
Normal file
2063
checkpoint-15/tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
304
checkpoint-15/trainer_state.json
Normal file
304
checkpoint-15/trainer_state.json
Normal file
@@ -0,0 +1,304 @@
|
|||||||
|
{
|
||||||
|
"best_global_step": 15,
|
||||||
|
"best_metric": 0.7146788990825688,
|
||||||
|
"best_model_checkpoint": "/root/BoolQ_Llama-3.2-1B/None/26t8ytsb_dazzling-shape-27/checkpoint-15",
|
||||||
|
"epoch": 5.0,
|
||||||
|
"eval_steps": 500,
|
||||||
|
"global_step": 15,
|
||||||
|
"is_hyper_param_search": false,
|
||||||
|
"is_local_process_zero": true,
|
||||||
|
"is_world_process_zero": true,
|
||||||
|
"log_history": [
|
||||||
|
{
|
||||||
|
"epoch": 0,
|
||||||
|
"eval_accumulated_loss": 2315.0376377105713,
|
||||||
|
"eval_accuracy": 0.6214067278287462,
|
||||||
|
"eval_accuracy_2822": 0.01949634443541836,
|
||||||
|
"eval_accuracy_9642": 0.9911154985192497,
|
||||||
|
"eval_correct_gen_preds": 2040.0,
|
||||||
|
"eval_correct_gen_preds_2822": 24.0,
|
||||||
|
"eval_correct_gen_preds_9642": 2007.0,
|
||||||
|
"eval_correct_preds": 2032.0,
|
||||||
|
"eval_correct_preds_2822": 24.0,
|
||||||
|
"eval_correct_preds_9642": 2008.0,
|
||||||
|
"eval_gen_accuracy": 0.6238532110091743,
|
||||||
|
"eval_gen_accuracy_2822": 0.01949634443541836,
|
||||||
|
"eval_gen_accuracy_9642": 0.9906219151036525,
|
||||||
|
"eval_loss": 0.7079626321792603,
|
||||||
|
"eval_mdl": 3339.893319396342,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 17.2936,
|
||||||
|
"eval_samples_per_second": 189.088,
|
||||||
|
"eval_steps_per_second": 1.619,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 0
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.3333333333333333,
|
||||||
|
"grad_norm": 19.625,
|
||||||
|
"learning_rate": 0.0,
|
||||||
|
"loss": 0.5234,
|
||||||
|
"step": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 0.6666666666666666,
|
||||||
|
"grad_norm": 44.25,
|
||||||
|
"learning_rate": 6.666666666666667e-06,
|
||||||
|
"loss": 0.7261,
|
||||||
|
"step": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 1.0,
|
||||||
|
"grad_norm": 19.75,
|
||||||
|
"learning_rate": 1.3333333333333333e-05,
|
||||||
|
"loss": 0.5194,
|
||||||
|
"step": 3
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 1.0,
|
||||||
|
"eval_accumulated_loss": 2543.702173233032,
|
||||||
|
"eval_accuracy": 0.5678899082568807,
|
||||||
|
"eval_accuracy_2822": 0.8911454102355808,
|
||||||
|
"eval_accuracy_9642": 0.3751233958538993,
|
||||||
|
"eval_correct_gen_preds": 1701.0,
|
||||||
|
"eval_correct_gen_preds_2822": 1022.0,
|
||||||
|
"eval_correct_gen_preds_9642": 670.0,
|
||||||
|
"eval_correct_preds": 1857.0,
|
||||||
|
"eval_correct_preds_2822": 1097.0,
|
||||||
|
"eval_correct_preds_9642": 760.0,
|
||||||
|
"eval_gen_accuracy": 0.5201834862385321,
|
||||||
|
"eval_gen_accuracy_2822": 0.8302193338748984,
|
||||||
|
"eval_gen_accuracy_9642": 0.33070088845014806,
|
||||||
|
"eval_loss": 0.7778906226158142,
|
||||||
|
"eval_mdl": 3669.7865108217748,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 17.8318,
|
||||||
|
"eval_samples_per_second": 183.381,
|
||||||
|
"eval_steps_per_second": 1.57,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 3
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 1.3333333333333333,
|
||||||
|
"grad_norm": 83.5,
|
||||||
|
"learning_rate": 2e-05,
|
||||||
|
"loss": 0.7045,
|
||||||
|
"step": 4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 1.6666666666666665,
|
||||||
|
"grad_norm": 102.5,
|
||||||
|
"learning_rate": 1.9999440560919153e-05,
|
||||||
|
"loss": 0.6378,
|
||||||
|
"step": 5
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 2.0,
|
||||||
|
"grad_norm": 58.5,
|
||||||
|
"learning_rate": 1.999776230627102e-05,
|
||||||
|
"loss": 0.2429,
|
||||||
|
"step": 6
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 2.0,
|
||||||
|
"eval_accumulated_loss": 2881.3052768707275,
|
||||||
|
"eval_accuracy": 0.6385321100917432,
|
||||||
|
"eval_accuracy_2822": 0.20389926888708368,
|
||||||
|
"eval_accuracy_9642": 0.9067127344521224,
|
||||||
|
"eval_correct_gen_preds": 1960.0,
|
||||||
|
"eval_correct_gen_preds_2822": 228.0,
|
||||||
|
"eval_correct_gen_preds_9642": 1723.0,
|
||||||
|
"eval_correct_preds": 2088.0,
|
||||||
|
"eval_correct_preds_2822": 251.0,
|
||||||
|
"eval_correct_preds_9642": 1837.0,
|
||||||
|
"eval_gen_accuracy": 0.599388379204893,
|
||||||
|
"eval_gen_accuracy_2822": 0.1852152721364744,
|
||||||
|
"eval_gen_accuracy_9642": 0.8504442250740375,
|
||||||
|
"eval_loss": 0.8811332583427429,
|
||||||
|
"eval_mdl": 4156.8448342286,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 17.8483,
|
||||||
|
"eval_samples_per_second": 183.21,
|
||||||
|
"eval_steps_per_second": 1.569,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 6
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 2.3333333333333335,
|
||||||
|
"grad_norm": 35.0,
|
||||||
|
"learning_rate": 1.9994965423831853e-05,
|
||||||
|
"loss": 0.251,
|
||||||
|
"step": 7
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 2.6666666666666665,
|
||||||
|
"grad_norm": 18.5,
|
||||||
|
"learning_rate": 1.999105022653872e-05,
|
||||||
|
"loss": 0.1897,
|
||||||
|
"step": 8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 3.0,
|
||||||
|
"grad_norm": 20.125,
|
||||||
|
"learning_rate": 1.9986017152454497e-05,
|
||||||
|
"loss": 0.0895,
|
||||||
|
"step": 9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 3.0,
|
||||||
|
"eval_accumulated_loss": 2496.1307315826416,
|
||||||
|
"eval_accuracy": 0.6886850152905198,
|
||||||
|
"eval_accuracy_2822": 0.5889520714865962,
|
||||||
|
"eval_accuracy_9642": 0.7537018756169793,
|
||||||
|
"eval_correct_gen_preds": 2036.0,
|
||||||
|
"eval_correct_gen_preds_2822": 670.0,
|
||||||
|
"eval_correct_gen_preds_9642": 1357.0,
|
||||||
|
"eval_correct_preds": 2252.0,
|
||||||
|
"eval_correct_preds_2822": 725.0,
|
||||||
|
"eval_correct_preds_9642": 1527.0,
|
||||||
|
"eval_gen_accuracy": 0.6226299694189602,
|
||||||
|
"eval_gen_accuracy_2822": 0.5442729488220959,
|
||||||
|
"eval_gen_accuracy_9642": 0.6697926949654491,
|
||||||
|
"eval_loss": 0.7633427977561951,
|
||||||
|
"eval_mdl": 3601.1554278648173,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 18.2462,
|
||||||
|
"eval_samples_per_second": 179.216,
|
||||||
|
"eval_steps_per_second": 1.535,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 3.3333333333333335,
|
||||||
|
"grad_norm": 9.4375,
|
||||||
|
"learning_rate": 1.9979866764718846e-05,
|
||||||
|
"loss": 0.0289,
|
||||||
|
"step": 10
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 3.6666666666666665,
|
||||||
|
"grad_norm": 1.1484375,
|
||||||
|
"learning_rate": 1.9972599751485225e-05,
|
||||||
|
"loss": 0.0049,
|
||||||
|
"step": 11
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 4.0,
|
||||||
|
"grad_norm": 207.0,
|
||||||
|
"learning_rate": 1.9964216925843876e-05,
|
||||||
|
"loss": 0.3695,
|
||||||
|
"step": 12
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 4.0,
|
||||||
|
"eval_accumulated_loss": 6579.203971862793,
|
||||||
|
"eval_accuracy": 0.6996941896024464,
|
||||||
|
"eval_accuracy_2822": 0.3013809910641755,
|
||||||
|
"eval_accuracy_9642": 0.9461994076999013,
|
||||||
|
"eval_correct_gen_preds": 2274.0,
|
||||||
|
"eval_correct_gen_preds_2822": 360.0,
|
||||||
|
"eval_correct_gen_preds_9642": 1905.0,
|
||||||
|
"eval_correct_preds": 2288.0,
|
||||||
|
"eval_correct_preds_2822": 371.0,
|
||||||
|
"eval_correct_preds_9642": 1917.0,
|
||||||
|
"eval_gen_accuracy": 0.6954128440366972,
|
||||||
|
"eval_gen_accuracy_2822": 0.2924451665312754,
|
||||||
|
"eval_gen_accuracy_9642": 0.9402764067127345,
|
||||||
|
"eval_loss": 2.011989116668701,
|
||||||
|
"eval_mdl": 9491.784943203424,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 18.2678,
|
||||||
|
"eval_samples_per_second": 179.003,
|
||||||
|
"eval_steps_per_second": 1.533,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 12
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 4.333333333333333,
|
||||||
|
"grad_norm": 41.0,
|
||||||
|
"learning_rate": 1.9954719225730847e-05,
|
||||||
|
"loss": 0.2212,
|
||||||
|
"step": 13
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 4.666666666666667,
|
||||||
|
"grad_norm": 0.033447265625,
|
||||||
|
"learning_rate": 1.9944107713823068e-05,
|
||||||
|
"loss": 0.0001,
|
||||||
|
"step": 14
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 5.0,
|
||||||
|
"grad_norm": 0.0299072265625,
|
||||||
|
"learning_rate": 1.9932383577419432e-05,
|
||||||
|
"loss": 0.0001,
|
||||||
|
"step": 15
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"epoch": 5.0,
|
||||||
|
"eval_accumulated_loss": 5369.499008178711,
|
||||||
|
"eval_accuracy": 0.7146788990825688,
|
||||||
|
"eval_accuracy_2822": 0.6344435418359058,
|
||||||
|
"eval_accuracy_9642": 0.7680157946692991,
|
||||||
|
"eval_correct_gen_preds": 2302.0,
|
||||||
|
"eval_correct_gen_preds_2822": 768.0,
|
||||||
|
"eval_correct_gen_preds_9642": 1525.0,
|
||||||
|
"eval_correct_preds": 2337.0,
|
||||||
|
"eval_correct_preds_2822": 781.0,
|
||||||
|
"eval_correct_preds_9642": 1556.0,
|
||||||
|
"eval_gen_accuracy": 0.7039755351681957,
|
||||||
|
"eval_gen_accuracy_2822": 0.6238830219333875,
|
||||||
|
"eval_gen_accuracy_9642": 0.7527147087857848,
|
||||||
|
"eval_loss": 1.6420485973358154,
|
||||||
|
"eval_mdl": 7746.5495911576345,
|
||||||
|
"eval_model_preparation_time": 0.0059,
|
||||||
|
"eval_runtime": 18.1931,
|
||||||
|
"eval_samples_per_second": 179.739,
|
||||||
|
"eval_steps_per_second": 1.539,
|
||||||
|
"eval_total_labels_2822": 1231.0,
|
||||||
|
"eval_total_labels_9642": 2026.0,
|
||||||
|
"eval_total_preds": 3270.0,
|
||||||
|
"step": 15
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"logging_steps": 1,
|
||||||
|
"max_steps": 300,
|
||||||
|
"num_input_tokens_seen": 0,
|
||||||
|
"num_train_epochs": 100,
|
||||||
|
"save_steps": 500,
|
||||||
|
"stateful_callbacks": {
|
||||||
|
"EarlyStoppingCallback": {
|
||||||
|
"args": {
|
||||||
|
"early_stopping_patience": 30,
|
||||||
|
"early_stopping_threshold": 0.0
|
||||||
|
},
|
||||||
|
"attributes": {
|
||||||
|
"early_stopping_patience_counter": 0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"TrainerControl": {
|
||||||
|
"args": {
|
||||||
|
"should_epoch_stop": false,
|
||||||
|
"should_evaluate": false,
|
||||||
|
"should_log": false,
|
||||||
|
"should_save": true,
|
||||||
|
"should_training_stop": false
|
||||||
|
},
|
||||||
|
"attributes": {}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"total_flos": 735978726752256.0,
|
||||||
|
"train_batch_size": 32,
|
||||||
|
"trial_name": null,
|
||||||
|
"trial_params": null
|
||||||
|
}
|
||||||
3
checkpoint-15/training_args.bin
Normal file
3
checkpoint-15/training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:bba2ee650523da0dcc4671708b05f027c5f0c6daae04e8f03b7cf4aa224a821b
|
||||||
|
size 5432
|
||||||
36
config.json
Normal file
36
config.json
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"head_dim": 64,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 8192,
|
||||||
|
"max_position_embeddings": 131072,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 16,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pad_token_id": 128004,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": {
|
||||||
|
"factor": 32.0,
|
||||||
|
"high_freq_factor": 4.0,
|
||||||
|
"low_freq_factor": 1.0,
|
||||||
|
"original_max_position_embeddings": 8192,
|
||||||
|
"rope_type": "llama3"
|
||||||
|
},
|
||||||
|
"rope_theta": 500000.0,
|
||||||
|
"tie_word_embeddings": true,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.51.3",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"transformers_version": "4.51.3"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:611518c4b19e36fefbf1d07e47fb013f8be2cc0cb7df8023331ef1456a8114a2
|
||||||
|
size 2471645608
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|begin_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|end_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|finetune_right_pad_id|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
2063
tokenizer_config.json
Normal file
2063
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:bba2ee650523da0dcc4671708b05f027c5f0c6daae04e8f03b7cf4aa224a821b
|
||||||
|
size 5432
|
||||||
Reference in New Issue
Block a user