commit 89804c240f21d440ac58575535c1c3dc73f79ca5 Author: ModelHub XC Date: Wed Jun 17 14:44:16 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: ruslandev/llama-3-8b-gpt-4o-ru1.0-gguf Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..39d7a9e --- /dev/null +++ b/.gitattributes @@ -0,0 +1,39 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +ggml-model-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text +ggml-model-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +ggml-model-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text +ggml-model-f16.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..c4990b8 --- /dev/null +++ b/README.md @@ -0,0 +1,148 @@ +--- +license: llama3 +base_model: meta-llama/Meta-Llama-3-8B-Instruct +tags: +- generated_from_trainer +model-index: +- name: >- + home/ubuntu/llm_training/axolotl/llama3-8b-gpt-4o-ru/output_llama3_8b_gpt_4o_ru + results: [] +datasets: +- ruslandev/tagengo-rus-gpt-4o +--- + +# Llama-3 8B GPT-4o-RU1.0 + +[[Dataset]](https://huggingface.co/datasets/ruslandev/tagengo-rus-gpt-4o) + +This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct). +The idea behind this model is to train on a dataset derived from a smaller subset of the [tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4), but with improved data quality. +I tried to achieve higher data quality by prompting GPT-4o, the latest OpenAI's LLM with better multilingual capabilities. The training objective is primarily focused on the Russian language (80% of the training examples). +After training for 1 epoch on 2 NVIDIA A100 the model shows promising results on the MT-Bench evaluation benchmark, surpassing GPT-3.5-turbo and being on par with [Suzume](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) in Russian language scores, +even though the latter is trained on 8x bigger and more diverse dataset. + +## How to use + +The easiest way to use this model on your own computer is to use the GGUF version of this model ([ruslandev/llama-3-8b-gpt-4o-ru1.0-gguf](https://huggingface.co/ruslandev/llama-3-8b-gpt-4o-ru1.0-gguf)) using a program such as [llama.cpp](https://github.com/ggerganov/llama.cpp). +If you want to use this model directly with the Huggingface Transformers stack, I recommend using my framework [gptchain](https://github.com/RuslanPeresy/gptchain). + +``` +git clone https://github.com/RuslanPeresy/gptchain.git +cd gptchain +pip install -r requirements-train.txt +python gptchain.py chat -m ruslandev/llama-3-8b-gpt-4o-ru1.0 \ + --chatml true \ + -q '[{"from": "human", "value": "Из чего состоит нейронная сеть?"}]' +``` + +## Evaluation scores + +I achieved the following scores on Ru/En MT-Bench: +| |meta-llama/Meta-Llama-3-8B-Instruct | ruslandev/llama-3-8b-gpt-4o-ru1.0 | lightblue/suzume-llama-3-8B-multilingual | Nexusflow/Starling-LM-7B-beta | gpt-3.5-turbo | +|:----------:|:----------------------------------:|:---------------------------------:|:----------------------------------------:|:-----------------------------:|:-------------:| +| Russian 🇷🇺 | NaN | 8.12 | 8.19 | 8.06 | 7.94 | +| English 🇺🇸 | 7.98 | 8.01 | 7.73 | 7.92 | 8.26 | + +## Training procedure + +[Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) +
See axolotl config + +axolotl version: `0.4.1` +```yaml +base_model: meta-llama/Meta-Llama-3-8B-Instruct +model_type: LlamaForCausalLM +tokenizer_type: AutoTokenizer # PreTrainedTokenizerFast + +load_in_8bit: false +load_in_4bit: false +strict: false + +datasets: + - path: ruslandev/tagengo-rus-gpt-4o + type: sharegpt + conversation: llama-3 +dataset_prepared_path: /home/ubuntu/llm_training/axolotl/llama3-8b-gpt-4o-ru/prepared_tagengo_rus +val_set_size: 0.01 +output_dir: /home/ubuntu/llm_training/axolotl/llama3-8b-gpt-4o-ru/output_llama3_8b_gpt_4o_ru + +sequence_len: 8192 +sample_packing: true +pad_to_sequence_len: true +eval_sample_packing: false + +use_wandb: false +#wandb_project: axolotl +#wandb_entity: wandb_entity +#wandb_name: llama_3_8b_gpt_4o_ru + +gradient_accumulation_steps: 2 +micro_batch_size: 2 +num_epochs: 1 +optimizer: paged_adamw_8bit +lr_scheduler: cosine +learning_rate: 1e-5 + +train_on_inputs: false +group_by_length: false +bf16: auto +fp16: +tf32: false + +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: false +early_stopping_patience: +resume_from_checkpoint: +logging_steps: 1 +xformers_attention: +flash_attention: true + +warmup_steps: 10 +evals_per_epoch: 5 +eval_table_size: +saves_per_epoch: 1 +debug: +deepspeed: /home/ubuntu/axolotl/deepspeed_configs/zero2.json +weight_decay: 0.0 +special_tokens: + pad_token: <|end_of_text|> + +``` + +

+ +### Training hyperparameters + +The following hyperparameters were used during training: +- learning_rate: 1e-05 +- train_batch_size: 2 +- eval_batch_size: 2 +- seed: 42 +- distributed_type: multi-GPU +- num_devices: 2 +- gradient_accumulation_steps: 2 +- total_train_batch_size: 8 +- total_eval_batch_size: 4 +- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 +- lr_scheduler_type: cosine +- lr_scheduler_warmup_steps: 10 +- num_epochs: 1 + +### Training results + +| Training Loss | Epoch | Step | Validation Loss | +|:-------------:|:-----:|:----:|:---------------:| +| 1.1347 | 0.016 | 1 | 1.1086 | +| 0.916 | 0.208 | 13 | 0.8883 | +| 0.8494 | 0.416 | 26 | 0.8072 | +| 0.8657 | 0.624 | 39 | 0.7814 | +| 0.8077 | 0.832 | 52 | 0.7702 | + + +### Framework versions + +- Transformers 4.41.1 +- Pytorch 2.2.2+cu121 +- Datasets 2.19.1 +- Tokenizers 0.19.1 \ No newline at end of file diff --git a/ggml-model-Q2_K.gguf b/ggml-model-Q2_K.gguf new file mode 100644 index 0000000..e6b51c9 --- /dev/null +++ b/ggml-model-Q2_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:286841129059d8d38d1564b940872eeda7c4b4178f5b6b46c7cad807be505e72 +size 3179131200 diff --git a/ggml-model-Q4_K_M.gguf b/ggml-model-Q4_K_M.gguf new file mode 100644 index 0000000..320e233 --- /dev/null +++ b/ggml-model-Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afe3ef1c8eb58901bfc3bdd24512f50116e6b05bf81d29ec3e54dc3976ade2b6 +size 4920734016 diff --git a/ggml-model-Q8_0.gguf b/ggml-model-Q8_0.gguf new file mode 100644 index 0000000..7060f4f --- /dev/null +++ b/ggml-model-Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc2d6cfa1206c1bc1a55dd74999a9364a91b9b6e444fe005a657b72c3cbd271c +size 8540770624 diff --git a/ggml-model-f16.gguf b/ggml-model-f16.gguf new file mode 100644 index 0000000..2f206ae --- /dev/null +++ b/ggml-model-f16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13cddb518b7bdab8909eda4c9f4684684490a1bc5f06bc30126bab39818c31bb +size 16068890944