commit 5764d62a127ca48cf3d3a9610568be8ad79e2fea Author: ModelHub XC Date: Tue May 5 09:46:28 2026 +0800 初始化项目,由ModelHub XC社区提供模型 Model: afrideva/japanese-mistral-300m-base-GGUF Source: Original Platform diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..42f10ae --- /dev/null +++ b/.gitattributes @@ -0,0 +1,42 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.fp16.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q2_k.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q6_k.gguf filter=lfs diff=lfs merge=lfs -text +japanese-mistral-300m-base.q8_0.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md new file mode 100644 index 0000000..55b6208 --- /dev/null +++ b/README.md @@ -0,0 +1,157 @@ +--- +base_model: ce-lery/japanese-mistral-300m-base +inference: false +model-index: +- name: checkpoints-mistral-300M-FA2 + results: [] +model_creator: ce-lery +model_name: japanese-mistral-300m-base +pipeline_tag: text-generation +quantized_by: afrideva +tags: +- generated_from_trainer +- gguf +- ggml +- quantized +- q2_k +- q3_k_m +- q4_k_m +- q5_k_m +- q6_k +- q8_0 +--- +# ce-lery/japanese-mistral-300m-base-GGUF + +Quantized GGUF model files for [japanese-mistral-300m-base](https://huggingface.co/ce-lery/japanese-mistral-300m-base) from [ce-lery](https://huggingface.co/ce-lery) + + +| Name | Quant method | Size | +| ---- | ---- | ---- | +| [japanese-mistral-300m-base.fp16.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.fp16.gguf) | fp16 | 712.33 MB | +| [japanese-mistral-300m-base.q2_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q2_k.gguf) | q2_k | 176.84 MB | +| [japanese-mistral-300m-base.q3_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q3_k_m.gguf) | q3_k_m | 195.04 MB | +| [japanese-mistral-300m-base.q4_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q4_k_m.gguf) | q4_k_m | 234.80 MB | +| [japanese-mistral-300m-base.q5_k_m.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q5_k_m.gguf) | q5_k_m | 266.47 MB | +| [japanese-mistral-300m-base.q6_k.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q6_k.gguf) | q6_k | 307.38 MB | +| [japanese-mistral-300m-base.q8_0.gguf](https://huggingface.co/afrideva/japanese-mistral-300m-base-GGUF/resolve/main/japanese-mistral-300m-base.q8_0.gguf) | q8_0 | 379.17 MB | + + + +## Original Model Card: + + +# japanese-mistral-300m-base + +## Overview + +Welcome to my model card! + +This Model feature is ... + +- Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format +- Pretrained by wikipedia dataset and cc100 dataset +- Use of [Mistral 300M](https://huggingface.co/ce-lery/japanese-mistral-300m-base/blob/main/config.json) + +Yukkuri shite ittene! + +## How to use the model + +```python +from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer +import torch + +MODEL_NAME = "ce-lery/japanese-mistral-300m-base" +torch.set_float32_matmul_precision('high') + +DEVICE = "cuda" +if torch.cuda.is_available(): + print("cuda") + DEVICE = "cuda" +else: + print("cpu") + DEVICE = "cpu" + +tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False) +model = AutoModelForCausalLM.from_pretrained( + MODEL_NAME, + trust_remote_code=True, +).to(DEVICE) + +# streamer = TextStreamer(tokenizer) + +prompt = "大規模言語モデルとは、" + +inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device) +with torch.no_grad(): + + outputs = model.generate( + inputs["input_ids"], + max_new_tokens=256, + do_sample=True, + early_stopping=False, + top_p=0.95, + top_k=50, + temperature=0.9, + # streamer=streamer, + no_repeat_ngram_size=2, + num_beams=3 + ) + +print(outputs.tolist()[0]) +outputs_txt = tokenizer.decode(outputs[0]) +print(outputs_txt) + +``` + +## Receipe + +If you want to restruct this model, you can refer [this Github repository](https://github.com/ce-lery/japanese-mistral-300m-recipe). + +I wrote the receipe for struction this model. For example, + +- Preprocess with sentencepiece +- Pretraining with flash attention2 and torch.compile and DeepSpeed +- Fine-tuning with databricks-dolly-15k-ja + +If you find my mistake,error,...etc, please create issue. +If you create pulreqest, I'm very happy! + +## Training procedure + +### Training hyperparameters + +The following hyperparameters were used during training: +- learning_rate: 0.0006 +- train_batch_size: 4 +- eval_batch_size: 4 +- seed: 42 +- distributed_type: multi-GPU +- gradient_accumulation_steps: 64 +- total_train_batch_size: 256 +- optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001 +- lr_scheduler_type: cosine +- lr_scheduler_warmup_steps: 1000 +- num_epochs: 1 +- mixed_precision_training: Native AMP + +### Training results + +| Training Loss | Epoch | Step | Validation Loss | +|:-------------:|:-----:|:-----:|:---------------:| +| 4.2911 | 0.12 | 5000 | 4.2914 | +| 3.9709 | 0.24 | 10000 | 3.9900 | +| 3.8229 | 0.36 | 15000 | 3.8388 | +| 3.7197 | 0.47 | 20000 | 3.7454 | +| 3.652 | 0.59 | 25000 | 3.6739 | +| 3.597 | 0.71 | 30000 | 3.6177 | +| 3.5554 | 0.83 | 35000 | 3.5770 | +| 3.536 | 0.95 | 40000 | 3.5582 | + + +### Framework versions + +- Transformers 4.35.2 +- Pytorch 2.1.1+cu121 +- Datasets 2.14.5 +- Tokenizers 0.14.1 \ No newline at end of file diff --git a/japanese-mistral-300m-base.fp16.gguf b/japanese-mistral-300m-base.fp16.gguf new file mode 100644 index 0000000..f2c594a --- /dev/null +++ b/japanese-mistral-300m-base.fp16.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1d0bcf858f6305b461f308b963ec447593d8f8a8cf000057e973456283277e2 +size 712325280 diff --git a/japanese-mistral-300m-base.q2_k.gguf b/japanese-mistral-300m-base.q2_k.gguf new file mode 100644 index 0000000..43e1f35 --- /dev/null +++ b/japanese-mistral-300m-base.q2_k.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57bf7c00d0cd76f815993cd7c9cd541f925784188bcdff47a2bebe47330b1d1a +size 176844064 diff --git a/japanese-mistral-300m-base.q3_k_m.gguf b/japanese-mistral-300m-base.q3_k_m.gguf new file mode 100644 index 0000000..23c434c --- /dev/null +++ b/japanese-mistral-300m-base.q3_k_m.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7acedb070e764f0650efa1f59f3a6bb3f1744c29f7fd2782ff861452b3c57b5f +size 195042816 diff --git a/japanese-mistral-300m-base.q4_k_m.gguf b/japanese-mistral-300m-base.q4_k_m.gguf new file mode 100644 index 0000000..9630f7e --- /dev/null +++ b/japanese-mistral-300m-base.q4_k_m.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08a63fe24eb4f472ac4d6a3fbe9e38c77f7732c8fc4ad8c0d7815bb4a29e3386 +size 234801408 diff --git a/japanese-mistral-300m-base.q5_k_m.gguf b/japanese-mistral-300m-base.q5_k_m.gguf new file mode 100644 index 0000000..35cb29f --- /dev/null +++ b/japanese-mistral-300m-base.q5_k_m.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa56215eff3854edaafeb50e3837b506641a08166540ce1d6b13e0d0dc5f5469 +size 266473856 diff --git a/japanese-mistral-300m-base.q6_k.gguf b/japanese-mistral-300m-base.q6_k.gguf new file mode 100644 index 0000000..04a0ee3 --- /dev/null +++ b/japanese-mistral-300m-base.q6_k.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7588c4f982fbc82e932731674247ec989400aa541eb094a99a1f7de95f0f9a7 +size 307383456 diff --git a/japanese-mistral-300m-base.q8_0.gguf b/japanese-mistral-300m-base.q8_0.gguf new file mode 100644 index 0000000..d09bd07 --- /dev/null +++ b/japanese-mistral-300m-base.q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2cce07660cdd111fcb30e5fd3d182c902533e436e1be674699d8dc1780232d6c +size 379165024