105 lines
2.9 KiB
Markdown
105 lines
2.9 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- el
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
tags:
|
|||
|
|
- finetuned
|
|||
|
|
inference: true
|
|||
|
|
base_model:
|
|||
|
|
- ilsp/Meltemi-7B-Instruct-v1.5
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Meltemi llamafile & gguf
|
|||
|
|
|
|||
|
|
This repo contains `llamafile` and `gguf` file format models for [Meltemi 7B Instruct v1.5](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1.5), the first Greek Large Language Model (LLM),
|
|||
|
|
trained by the Institute for Language and Speech Processing at Athena Research & Innovation Center.
|
|||
|
|
|
|||
|
|
lamafile is a file format introduced by Mozilla Ocho on Nov 20th 2023,
|
|||
|
|
and it collapses the complexity of an LLM into a single executable file.
|
|||
|
|
This gives you the easiest, fastest way to use Meltemi on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD systems you control on both AMD64 and ARM64.
|
|||
|
|
|
|||
|
|
It's as simple as this
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
wget https://huggingface.co/Florents-Tselai/Meltemi-llamafile/resolve/main/Meltemi-7B-Instruct-v1.5-Q8_0.llamafile
|
|||
|
|
chmod +x Meltemi-7B-Instruct-v1.5-Q8_0.llamafile
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
./Meltemi-7B-Instruct-v1.5-Q8_0.llamafile
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This will open a tab with a chatbot and completion interface in your browser.
|
|||
|
|
For additional help on how it may be used, pass the `--help` flag.
|
|||
|
|
|
|||
|
|
## API
|
|||
|
|
|
|||
|
|
The server also has an OpenAI API-compatible completions endpoint.
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
curl http://localhost:8080/v1/chat/completions \
|
|||
|
|
-H "Content-Type: application/json" \
|
|||
|
|
-H "Authorization: Bearer no-key" \
|
|||
|
|
-d '{
|
|||
|
|
"model": "LLaMA_CPP",
|
|||
|
|
"messages": [
|
|||
|
|
{
|
|||
|
|
"role": "system",
|
|||
|
|
"content": "Είσαι ένας φωτεινός παντογνώστης"
|
|||
|
|
},
|
|||
|
|
{
|
|||
|
|
"role": "user",
|
|||
|
|
"content": "Γράψε μου μια ιστορία για έναν βάτραχο που έγινε αρνάκι"
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## CLI
|
|||
|
|
|
|||
|
|
An advanced CLI mode is provided that's useful for shell scripting.
|
|||
|
|
You can use it by passing the `--cli` flag. For additional help on how it may be used, pass the --help flag.
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
./Meltemi-7B-Instruct-v1.5-Q8_0.llamafile -p 'Ποιό είναι το νόημα της ζωής;'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
To see all available options
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
./Meltemi-7B-Instruct-v1.5-Q8_0.llamafile --help
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## gguf
|
|||
|
|
|
|||
|
|
`gguf` file formats are also available if you're working with llama.cpp [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
|||
|
|
|
|||
|
|
llama.cpp offers quite a lot of options, thus refer to its documentation.
|
|||
|
|
|
|||
|
|
### Basic Usage
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
llama-cli -m ./Meltemi-7B-Instruct-v1.5-F16.gguf -p "Ποιό είναι το νόημα της ζωής;" -n 128
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Conversation Mode
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
llama-cli -m ./Meltemi-7B-Instruct-v1.5-F16.gguf --conv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Web Server
|
|||
|
|
|
|||
|
|
```shell
|
|||
|
|
llama-server -m ./Meltemi-7B-Instruct-v1.5-F16.gguf --port 8080
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
# Model Information
|
|||
|
|
|
|||
|
|
- Vocabulary extension of the Mistral 7b tokenizer with Greek tokens for lower costs and faster inference (**1.52** vs. 6.80 tokens/word for Greek)
|
|||
|
|
- 8192 context length
|
|||
|
|
|
|||
|
|
For more details, please refer to the original model card [Meltemi 7B Instract v1.5](https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1.5)
|