Compare commits
10 Commits
263ed8eb1f
...
ccd7b54d9f
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ccd7b54d9f | ||
|
|
b17df1328a | ||
|
|
d49be2d4f8 | ||
|
|
7c75df30ed | ||
|
|
b83f9e21f0 | ||
|
|
f52ce40934 | ||
|
|
db6a8f3b99 | ||
|
|
d678c617a0 | ||
|
|
6720fca4a6 | ||
|
|
29ae7551e1 |
3
.gitattributes
vendored
3
.gitattributes
vendored
@@ -53,3 +53,6 @@ Turkish-Llama-8b-Instruct-v0.1.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1.Q5_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Turkish-Llama-8b-Instruct-v0.1-F16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
|
||||
110
README.md
Normal file
110
README.md
Normal file
@@ -0,0 +1,110 @@
|
||||
---
|
||||
base_model: ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1
|
||||
license: llama3
|
||||
language:
|
||||
- tr
|
||||
- en
|
||||
tags:
|
||||
- gguf
|
||||
- ggml
|
||||
- llama3
|
||||
- cosmosllama
|
||||
- turkish llama
|
||||
---
|
||||
# CosmsoLLaMa GGUFs
|
||||
|
||||
## Objective
|
||||
Due to the need for quantized models in real-time applications, we introduce our GGUF formatted models. These models are part of
|
||||
GGML project with a hope to democratize the use of Large Models. Depending on the quantization type, there are 20+ models.
|
||||
|
||||
### Features
|
||||
* All quantization details are listed on the right by Hugging Face.
|
||||
* All the models have been tested in `llama.cpp` environments, `llama-cli` and `llama-server`.
|
||||
* Furthermore, a YouTube video has been made to introduce the basics of using `lmstudio` to utilize these models. 👇
|
||||
[](https://www.youtube.com/watch?v=JRID-6sRl7I)
|
||||
|
||||
|
||||
### Code Example
|
||||
Usage example with `llama-cpp-python`
|
||||
|
||||
```py
|
||||
from llama_cpp import Llama
|
||||
|
||||
# Define the inference parameters
|
||||
inference_params = {
|
||||
"n_threads": 4,
|
||||
"n_predict": -1,
|
||||
"top_k": 40,
|
||||
"min_p": 0.05,
|
||||
"top_p": 0.95,
|
||||
"temp": 0.8,
|
||||
"repeat_penalty": 1.1,
|
||||
"input_prefix": "<|start_header_id|>user<|end_header_id|>\\n\\n",
|
||||
"input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
|
||||
"antiprompt": [],
|
||||
"pre_prompt": "Sen bir yapay zeka asistanısın. Kullanıcı sana bir görev verecek. Amacın görevi olabildiğince sadık bir şekilde tamamlamak.",
|
||||
"pre_prompt_suffix": "<|eot_id|>",
|
||||
"pre_prompt_prefix": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n",
|
||||
"seed": -1,
|
||||
"tfs_z": 1,
|
||||
"typical_p": 1,
|
||||
"repeat_last_n": 64,
|
||||
"frequency_penalty": 0,
|
||||
"presence_penalty": 0,
|
||||
"n_keep": 0,
|
||||
"logit_bias": {},
|
||||
"mirostat": 0,
|
||||
"mirostat_tau": 5,
|
||||
"mirostat_eta": 0.1,
|
||||
"memory_f16": True,
|
||||
"multiline_input": False,
|
||||
"penalize_nl": True
|
||||
}
|
||||
|
||||
# Initialize the Llama model with the specified inference parameters
|
||||
llama = Llama.from_pretrained(
|
||||
repo_id="ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1-GGUF",
|
||||
filename="*Q4_K.gguf",
|
||||
verbose=False
|
||||
)
|
||||
# Example input
|
||||
user_input = "Türkiyenin başkenti neresidir?"
|
||||
|
||||
# Construct the prompt
|
||||
prompt = f"{inference_params['pre_prompt_prefix']}{inference_params['pre_prompt']}\n\n{inference_params['input_prefix']}{user_input}{inference_params['input_suffix']}"
|
||||
|
||||
# Generate the response
|
||||
response = llama(prompt)
|
||||
|
||||
# Output the response
|
||||
print(response['choices'][0]['text'])
|
||||
|
||||
```
|
||||
|
||||
The quantization has been made using `llama.cpp`. As we have seen, this method tends to give the most stable results.
|
||||
|
||||
Obviously, we encountered better inference quality for models with the highest bits. However, the inference time tends to be similar between low-bit models.
|
||||
|
||||
Each model's memory footprint can be anticipated by the qunatization docs in either [Hugging Face](https://huggingface.co/docs/transformers/main/en/quantization/overview) or [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize).
|
||||
|
||||
|
||||
# Acknowledgments
|
||||
- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
|
||||
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
|
||||
|
||||
# Citation
|
||||
```bibtex
|
||||
@inproceedings{kesgin2024optimizing,
|
||||
title={Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training},
|
||||
author={Kesgin, H Toprak and Yuce, M Kaan and Dogan, Eren and Uzun, M Egemen and Uz, Atahan and {\.I}nce, Elif and Erdem, Yusuf and Shbib, Osama and Zeer, Ahmed and Amasyali, M Fatih},
|
||||
booktitle={2024 Innovations in Intelligent Systems and Applications Conference (ASYU)},
|
||||
pages={1--6},
|
||||
year={2024},
|
||||
organization={IEEE}
|
||||
}
|
||||
```
|
||||
|
||||
## Contact
|
||||
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
|
||||
https://cosmos.yildiz.edu.tr/
|
||||
cosmos@yildiz.edu.tr
|
||||
3
Turkish-Llama-8b-Instruct-v0.1-F16.gguf
Normal file
3
Turkish-Llama-8b-Instruct-v0.1-F16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ae6366fbcbd5a7a20b05bece7e59c79b8199f74669c5f946f909b579eabf737c
|
||||
size 16068890880
|
||||
3
Turkish-Llama-8b-Instruct-v0.1.Q6_K.gguf
Normal file
3
Turkish-Llama-8b-Instruct-v0.1.Q6_K.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3ce9442d5a7fe21eb9f79df7b30c70dbd4e5e7e29ee10ae447a40b5595c7ea9e
|
||||
size 6596006144
|
||||
3
Turkish-Llama-8b-Instruct-v0.1.Q8_0.gguf
Normal file
3
Turkish-Llama-8b-Instruct-v0.1.Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9e15b58cb80e3ef24d5acba418441b0c2bf69853d37403d2c4c5ea2b63b34de5
|
||||
size 8540770560
|
||||
49
cosmos_lm_studio.preset.json
Normal file
49
cosmos_lm_studio.preset.json
Normal file
@@ -0,0 +1,49 @@
|
||||
{
|
||||
"name": "cosmos_lm_studio",
|
||||
"load_params": {
|
||||
"n_ctx": 2048,
|
||||
"n_batch": 512,
|
||||
"rope_freq_base": 0,
|
||||
"rope_freq_scale": 0,
|
||||
"n_gpu_layers": 10,
|
||||
"use_mlock": true,
|
||||
"main_gpu": 0,
|
||||
"tensor_split": [
|
||||
0
|
||||
],
|
||||
"seed": -1,
|
||||
"f16_kv": true,
|
||||
"use_mmap": true,
|
||||
"no_kv_offload": false,
|
||||
"num_experts_used": 0
|
||||
},
|
||||
"inference_params": {
|
||||
"n_threads": 4,
|
||||
"n_predict": -1,
|
||||
"top_k": 40,
|
||||
"min_p": 0.05,
|
||||
"top_p": 0.95,
|
||||
"temp": 0.8,
|
||||
"repeat_penalty": 1.1,
|
||||
"input_prefix": "<|start_header_id|>user<|end_header_id|>\\n\\n",
|
||||
"input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
|
||||
"antiprompt": [],
|
||||
"pre_prompt": "Sen bir yapay zeka asistanısın. Kullanıcı sana bir görev verecek. Amacın görevi olabildiğince sadık bir şekilde tamamlamak. Görevi yerine getirirken adım adım düşün ve adımlarını gerekçelendir.",
|
||||
"pre_prompt_suffix": "<|eot_id|>",
|
||||
"pre_prompt_prefix": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n",
|
||||
"seed": -1,
|
||||
"tfs_z": 1,
|
||||
"typical_p": 1,
|
||||
"repeat_last_n": 64,
|
||||
"frequency_penalty": 0,
|
||||
"presence_penalty": 0,
|
||||
"n_keep": 0,
|
||||
"logit_bias": {},
|
||||
"mirostat": 0,
|
||||
"mirostat_tau": 5,
|
||||
"mirostat_eta": 0.1,
|
||||
"memory_f16": true,
|
||||
"multiline_input": false,
|
||||
"penalize_nl": true
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user