初始化项目,由ModelHub XC社区提供模型

Model: pankajmathur/RenCoder-Ministral-8B-Instruct-2410
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-27 20:57:41 +08:00
commit 5eb30282d8
17 changed files with 9284 additions and 0 deletions

38
.gitattributes vendored Normal file
View File

@@ -0,0 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tekken.json filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
RenCoder.png filter=lfs diff=lfs merge=lfs -text

788
README.md Normal file
View File

@@ -0,0 +1,788 @@
---
library_name: transformers
language:
- en
- fr
- de
- es
- it
- pt
- zh
- ja
- ru
- ko
license: apache-2.0
license_name: mit
inference: false
datasets:
- pankajmathur/OpenThoughts-Agent-v1-SFT-cleaned
- pankajmathur/orca_mini_v8_sharegpt_format
- pankajmathur/orca_mini_v1_dataset
base_model:
- mistralai/Ministral-8B-Instruct-2410
pipeline_tag: text-generation
---
# RenCoder-Ministral-8B-Instruct-2410
<img src="https://huggingface.co/pankajmathur/RenCoder-Ministral-8B-Instruct-2410/resolve/main/RenCoder.png" height="600" width="600" />
RenCoder-Ministral-8B-Instruct-2410 is fine-tuned version of [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) on muliple agentic coding datasets.
<strong>
"Obsessed with building Open Source AGI, So am I ! Let's create together 🚀 <a href="https://www.linkedin.com/in/pankajam" target="_blank">https://www.linkedin.com/in/pankajam</a>"
</strong>
It achieves the following results on the evaluation set:
- Loss: 0.1539
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 106
- training_steps: 1068
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| No log | 0 | 0 | 0.4748 |
| 0.1752 | 0.9988 | 356 | 0.1681 |
| 0.1518 | 1.9960 | 712 | 0.1557 |
| 0.1284 | 2.9932 | 1068 | 0.1539 |
### Framework versions
- PEFT 0.18.0
- Transformers 4.57.1
- Pytorch 2.9.1+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
### Basic Instruct Template (V3-Tekken)
```
<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
```
*For more information about the tokenizer please refer to [mistral-common](https://github.com/mistralai/mistral-common)*
## Ministral 8B Architecture
| Feature | Value |
|:---------------------:|:--------------------:|
| **Architecture** | Dense Transformer |
| **Parameters** | 8,019,808,256 |
| **Layers** | 36 |
| **Heads** | 32 |
| **Dim** | 4096 |
| **KV Heads (GQA)** | 8 |
| **Hidden Dim** | 12288 |
| **Head Dim** | 128 |
| **Vocab Size** | 131,072 |
| **Context Length** | 128k |
| **Attention Pattern** | Ragged (128k,32k,32k,32k) |
## Benchmarks
#### Base Models
<u>Knowledge & Commonsense</u>
| Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
|:-------------:|:------:|:---------:|:------------:|:-------:|:----------:|
| Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
| Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
| ***Ministral 8B Base*** | ***<u>65.0</u>*** | ***<u>48.3</u>*** | ***<u>75.3</u>*** | ***<u>71.9</u>*** | ***<u>65.5</u>*** |
| | | | | | |
| Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
| Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
| ***Ministral 3B Base*** | ***<u>60.9</u>*** | ***<u>42.1</u>*** | ***<u>72.7</u>*** | ***<u>64.2</u>*** | ***<u>56.7</u>*** |
<u>Code & Math</u>
| Model | HumanEval pass@1 |GSM8K maj@8 |
|:-------------:|:-------------------:|:---------------:|
| Mistral 7B Base | 26.8 | 32.0 |
| Llama 3.1 8B Base | ***<u>37.8</u>*** | 42.2 |
| ***Ministral 8B Base*** | 34.8 | ***<u>64.5</u>*** |
| | | |
| Gemma 2 2B | 20.1 | 35.5 |
| Llama 3.2 3B | 14.6 | 33.5 |
| ***Ministral 3B*** | ***<u>34.2</u>*** | ***<u>50.9</u>*** |
<u>Multilingual</u>
| Model | French MMLU | German MMLU | Spanish MMLU |
|:-------------:|:-------------:|:-------------:|:-------------:|
| Mistral 7B Base | 50.6 | 49.6 | 51.4 |
| Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
| ***Ministral 8B Base*** | ***<u>57.5</u>*** | ***<u>57.4</u>*** | ***<u>59.6</u>*** |
| | | | |
| Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
| Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
| ***Ministral 3B Base*** | ***<u>49.1</u>*** | ***<u>48.3</u>*** | ***<u>49.5</u>*** |
### Instruct Models
<u>Chat/Arena (gpt-4o judge)</u>
| Model | MTBench | Arena Hard | Wild bench |
|:-------------:|:---------:|:------------:|:------------:|
| Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
| Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
| Gemma 2 9B Instruct | 7.6 | 68.7 | ***<u>43.8</u>*** |
| ***Ministral 8B Instruct*** | ***<u>8.3</u>*** | ***<u>70.9</u>*** | 41.3 |
| | | | |
| Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
| Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
| ***Ministral 3B Instruct*** | ***<u>8.1</u>*** | ***<u>64.3</u>*** | ***<u>36.3</u>*** |
<u>Code & Math</u>
| Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
|:-------------:|:-------------:|:------------------:|:-------------:|
| Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
| Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
| ***Ministral 8B Instruct*** | ***<u>70.0</u>*** | ***<u>76.8</u>*** | ***<u>54.5</u>*** |
| | | | |
| Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
| Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
| ***Ministral 3B* Instruct** | ***<u>67.7</u>*** | ***<u>77.4</u>*** | ***<u>51.7</u>*** |
<u>Function calling</u>
| Model | Internal bench |
|:-------------:|:-----------------:|
| Mistral 7B Instruct v0.3 | 6.9 |
| Llama 3.1 8B Instruct | N/A |
| Gemma 2 9B Instruct | N/A |
| ***Ministral 8B Instruct*** | ***<u>31.6</u>*** |
| | |
| Gemma 2 2B Instruct | N/A |
| Llama 3.2 3B Instruct | N/A |
| ***Ministral 3B Instruct*** | ***<u>28.4</u>*** |
## Usage Examples
### vLLM (recommended)
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
to implement production-ready inference pipelines.
> [!IMPORTANT]
> Currently vLLM is capped at 32k context size because interleaved attention kernels for paged attention are not yet implemented in vLLM.
> Attention kernels for paged attention are being worked on and as soon as it is fully supported in vLLM, this model card will be updated.
> To take advantage of the full 128k context size we recommend [Mistral Inference](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410#mistral-inference)
**_Installation_**
Make sure you install `vLLM >= v0.6.4`:
```
pip install --upgrade vllm
```
Also make sure you have `mistral_common >= 1.4.4` installed:
```
pip install --upgrade mistral_common
```
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile).
**_Offline_**
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
model_name = "mistralai/Ministral-8B-Instruct-2410"
sampling_params = SamplingParams(max_tokens=8192)
# note that running Ministral 8B on a single GPU requires 24 GB of GPU RAM
# If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
prompt = "Do we need to think for 10 seconds to find the answer of 1 + 1?"
messages = [
{
"role": "user",
"content": prompt
},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
# You don't need to think for 10 seconds to find the answer to 1 + 1. The answer is 2,
# and you can easily add these two numbers in your mind very quickly without any delay.
```
**_Server_**
You can also use Ministral-8B in a server/client setting.
1. Spin up a server:
```
vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral
```
**Note:** Running Ministral-8B on a single GPU requires 24 GB of GPU RAM.
If you want to divide the GPU requirement over multiple devices, please add *e.g.* `--tensor_parallel=2`
2. And ping the client:
```
curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data '{
"model": "mistralai/Ministral-8B-Instruct-2410",
"messages": [
{
"role": "user",
"content": "Do we need to think for 10 seconds to find the answer of 1 + 1?"
}
]
}'
```
### Mistral-inference
We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
**_Install_**
Make sure to have `mistral_inference >= 1.5.0` installed.
```
pip install mistral_inference --upgrade
```
**_Download_**
```py
from huggingface_hub import snapshot_download
from pathlib import Path
mistral_models_path = Path.home().joinpath('mistral_models', '8B-Instruct')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id="mistralai/Ministral-8B-Instruct-2410", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
```
### Chat
After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
```
mistral-chat $HOME/mistral_models/8B-Instruct --instruct --max_tokens 256
```
### Passkey detection
> [!IMPORTANT]
> In this example the passkey message has over >100k tokens and mistral-inference
> does not have a chunked pre-fill mechanism. Therefore you will need a lot of
> GPU memory in order to run the below example (80 GB). For a more memory-efficient
> solution we recommend using vLLM.
```py
from mistral_inference.transformer import Transformer
from pathlib import Path
import json
from mistral_inference.generate import generate
from huggingface_hub import hf_hub_download
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
def load_passkey_request() -> ChatCompletionRequest:
passkey_file = hf_hub_download(repo_id="mistralai/Ministral-8B-Instruct-2410", filename="passkey_example.json")
with open(passkey_file, "r") as f:
data = json.load(f)
message_content = data["messages"][0]["content"]
return ChatCompletionRequest(messages=[UserMessage(content=message_content)])
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path, softmax_fp32=False)
completion_request = load_passkey_request()
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result) # The pass key is 13005.
```
### Instruct following
```py
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
```
### Function calling
```py
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
tekken = tokenizer.instruct_tokenizer.tokenizer
tekken.special_token_policy = SpecialTokenPolicy.IGNORE
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
```
## The Mistral AI Team
Albert Jiang, Alexandre Abou Chahine, Alexandre Sablayrolles, Alexis Tacnet, Alodie Boissonnet, Alok Kothari, Amélie Héliou, Andy Lo, Anna Peronnin, Antoine Meunier, Antoine Roux, Antonin Faure, Aritra Paul, Arthur Darcet, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Avinash Sooriyarachchi, Baptiste Rozière, Barry Conklin, Bastien Bouillon, Blanche Savary de Beauregard, Carole Rambaud, Caroline Feldman, Charles de Freminville, Charline Mauro, Chih-Kuan Yeh, Chris Bamford, Clement Auguy, Corentin Heintz, Cyriaque Dubois, Devendra Singh Chaplot, Diego Las Casas, Diogo Costa, Eléonore Arcelin, Emma Bou Hanna, Etienne Metzger, Fanny Olivier Autran, Francois Lesage, Garance Gourdel, Gaspard Blanchet, Gaspard Donada Vidal, Gianna Maria Lengyel, Guillaume Bour, Guillaume Lample, Gustave Denis, Harizo Rajaona, Himanshu Jaju, Ian Mack, Ian Mathew, Jean-Malo Delignon, Jeremy Facchetti, Jessica Chudnovsky, Joachim Studnia, Justus Murke, Kartik Khandelwal, Kenneth Chiu, Kevin Riera, Leonard Blier, Leonard Suslian, Leonardo Deschaseaux, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Sophia Yang, Margaret Jennings, Marie Pellat, Marie Torelli, Marjorie Janiewicz, Mathis Felardos, Maxime Darrin, Michael Hoff, Mickaël Seznec, Misha Jessel Kenyon, Nayef Derwiche, Nicolas Carmont Zaragoza, Nicolas Faurie, Nicolas Moreau, Nicolas Schuhl, Nikhil Raghuraman, Niklas Muhs, Olivier de Garrigues, Patricia Rozé, Patricia Wang, Patrick von Platen, Paul Jacob, Pauline Buche, Pavankumar Reddy Muddireddy, Perry Savas, Pierre Stock, Pravesh Agrawal, Renaud de Peretti, Romain Sauvestre, Romain Sinthe, Roman Soletskyi, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Soham Ghosh, Sylvain Regnier, Szymon Antoniak, Teven Le Scao, Theophile Gervet, Thibault Schueller, Thibaut Lavril, Thomas Wang, Timothée Lacroix, Valeriia Nemychnikova, Wendy Shang, William El Sayed, William Marshall
### Basic Instruct Template (V3-Tekken)
```
<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
```
*For more information about the tokenizer please refer to [mistral-common](https://github.com/mistralai/mistral-common)*
## Ministral 8B Architecture
| Feature | Value |
|:---------------------:|:--------------------:|
| **Architecture** | Dense Transformer |
| **Parameters** | 8,019,808,256 |
| **Layers** | 36 |
| **Heads** | 32 |
| **Dim** | 4096 |
| **KV Heads (GQA)** | 8 |
| **Hidden Dim** | 12288 |
| **Head Dim** | 128 |
| **Vocab Size** | 131,072 |
| **Context Length** | 128k |
| **Attention Pattern** | Ragged (128k,32k,32k,32k) |
## Benchmarks
#### Base Models
<u>Knowledge & Commonsense</u>
| Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
|:-------------:|:------:|:---------:|:------------:|:-------:|:----------:|
| Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
| Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
| ***Ministral 8B Base*** | ***<u>65.0</u>*** | ***<u>48.3</u>*** | ***<u>75.3</u>*** | ***<u>71.9</u>*** | ***<u>65.5</u>*** |
| | | | | | |
| Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
| Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
| ***Ministral 3B Base*** | ***<u>60.9</u>*** | ***<u>42.1</u>*** | ***<u>72.7</u>*** | ***<u>64.2</u>*** | ***<u>56.7</u>*** |
<u>Code & Math</u>
| Model | HumanEval pass@1 |GSM8K maj@8 |
|:-------------:|:-------------------:|:---------------:|
| Mistral 7B Base | 26.8 | 32.0 |
| Llama 3.1 8B Base | ***<u>37.8</u>*** | 42.2 |
| ***Ministral 8B Base*** | 34.8 | ***<u>64.5</u>*** |
| | | |
| Gemma 2 2B | 20.1 | 35.5 |
| Llama 3.2 3B | 14.6 | 33.5 |
| ***Ministral 3B*** | ***<u>34.2</u>*** | ***<u>50.9</u>*** |
<u>Multilingual</u>
| Model | French MMLU | German MMLU | Spanish MMLU |
|:-------------:|:-------------:|:-------------:|:-------------:|
| Mistral 7B Base | 50.6 | 49.6 | 51.4 |
| Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
| ***Ministral 8B Base*** | ***<u>57.5</u>*** | ***<u>57.4</u>*** | ***<u>59.6</u>*** |
| | | | |
| Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
| Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
| ***Ministral 3B Base*** | ***<u>49.1</u>*** | ***<u>48.3</u>*** | ***<u>49.5</u>*** |
### Instruct Models
<u>Chat/Arena (gpt-4o judge)</u>
| Model | MTBench | Arena Hard | Wild bench |
|:-------------:|:---------:|:------------:|:------------:|
| Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
| Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
| Gemma 2 9B Instruct | 7.6 | 68.7 | ***<u>43.8</u>*** |
| ***Ministral 8B Instruct*** | ***<u>8.3</u>*** | ***<u>70.9</u>*** | 41.3 |
| | | | |
| Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
| Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
| ***Ministral 3B Instruct*** | ***<u>8.1</u>*** | ***<u>64.3</u>*** | ***<u>36.3</u>*** |
<u>Code & Math</u>
| Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
|:-------------:|:-------------:|:------------------:|:-------------:|
| Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
| Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
| ***Ministral 8B Instruct*** | ***<u>70.0</u>*** | ***<u>76.8</u>*** | ***<u>54.5</u>*** |
| | | | |
| Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
| Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
| ***Ministral 3B* Instruct** | ***<u>67.7</u>*** | ***<u>77.4</u>*** | ***<u>51.7</u>*** |
<u>Function calling</u>
| Model | Internal bench |
|:-------------:|:-----------------:|
| Mistral 7B Instruct v0.3 | 6.9 |
| Llama 3.1 8B Instruct | N/A |
| Gemma 2 9B Instruct | N/A |
| ***Ministral 8B Instruct*** | ***<u>31.6</u>*** |
| | |
| Gemma 2 2B Instruct | N/A |
| Llama 3.2 3B Instruct | N/A |
| ***Ministral 3B Instruct*** | ***<u>28.4</u>*** |
## Usage Examples
### vLLM (recommended)
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
to implement production-ready inference pipelines.
> [!IMPORTANT]
> Currently vLLM is capped at 32k context size because interleaved attention kernels for paged attention are not yet implemented in vLLM.
> Attention kernels for paged attention are being worked on and as soon as it is fully supported in vLLM, this model card will be updated.
> To take advantage of the full 128k context size we recommend [Mistral Inference](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410#mistral-inference)
**_Installation_**
Make sure you install `vLLM >= v0.6.4`:
```
pip install --upgrade vllm
```
Also make sure you have `mistral_common >= 1.4.4` installed:
```
pip install --upgrade mistral_common
```
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile).
**_Offline_**
```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
model_name = "mistralai/Ministral-8B-Instruct-2410"
sampling_params = SamplingParams(max_tokens=8192)
# note that running Ministral 8B on a single GPU requires 24 GB of GPU RAM
# If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
prompt = "Do we need to think for 10 seconds to find the answer of 1 + 1?"
messages = [
{
"role": "user",
"content": prompt
},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
# You don't need to think for 10 seconds to find the answer to 1 + 1. The answer is 2,
# and you can easily add these two numbers in your mind very quickly without any delay.
```
**_Server_**
You can also use Ministral-8B in a server/client setting.
1. Spin up a server:
```
vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral
```
**Note:** Running Ministral-8B on a single GPU requires 24 GB of GPU RAM.
If you want to divide the GPU requirement over multiple devices, please add *e.g.* `--tensor_parallel=2`
2. And ping the client:
```
curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer token' \
--data '{
"model": "mistralai/Ministral-8B-Instruct-2410",
"messages": [
{
"role": "user",
"content": "Do we need to think for 10 seconds to find the answer of 1 + 1?"
}
]
}'
```
### Mistral-inference
We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
**_Install_**
Make sure to have `mistral_inference >= 1.5.0` installed.
```
pip install mistral_inference --upgrade
```
**_Download_**
```py
from huggingface_hub import snapshot_download
from pathlib import Path
mistral_models_path = Path.home().joinpath('mistral_models', '8B-Instruct')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id="mistralai/Ministral-8B-Instruct-2410", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
```
### Chat
After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
```
mistral-chat $HOME/mistral_models/8B-Instruct --instruct --max_tokens 256
```
### Passkey detection
> [!IMPORTANT]
> In this example the passkey message has over >100k tokens and mistral-inference
> does not have a chunked pre-fill mechanism. Therefore you will need a lot of
> GPU memory in order to run the below example (80 GB). For a more memory-efficient
> solution we recommend using vLLM.
```py
from mistral_inference.transformer import Transformer
from pathlib import Path
import json
from mistral_inference.generate import generate
from huggingface_hub import hf_hub_download
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
def load_passkey_request() -> ChatCompletionRequest:
passkey_file = hf_hub_download(repo_id="mistralai/Ministral-8B-Instruct-2410", filename="passkey_example.json")
with open(passkey_file, "r") as f:
data = json.load(f)
message_content = data["messages"][0]["content"]
return ChatCompletionRequest(messages=[UserMessage(content=message_content)])
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path, softmax_fp32=False)
completion_request = load_passkey_request()
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result) # The pass key is 13005.
```
### Instruct following
```py
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
```
### Function calling
```py
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
tekken = tokenizer.instruct_tokenizer.tokenizer
tekken.special_token_policy = SpecialTokenPolicy.IGNORE
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
```
## The Mistral AI Team
Albert Jiang, Alexandre Abou Chahine, Alexandre Sablayrolles, Alexis Tacnet, Alodie Boissonnet, Alok Kothari, Amélie Héliou, Andy Lo, Anna Peronnin, Antoine Meunier, Antoine Roux, Antonin Faure, Aritra Paul, Arthur Darcet, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Avinash Sooriyarachchi, Baptiste Rozière, Barry Conklin, Bastien Bouillon, Blanche Savary de Beauregard, Carole Rambaud, Caroline Feldman, Charles de Freminville, Charline Mauro, Chih-Kuan Yeh, Chris Bamford, Clement Auguy, Corentin Heintz, Cyriaque Dubois, Devendra Singh Chaplot, Diego Las Casas, Diogo Costa, Eléonore Arcelin, Emma Bou Hanna, Etienne Metzger, Fanny Olivier Autran, Francois Lesage, Garance Gourdel, Gaspard Blanchet, Gaspard Donada Vidal, Gianna Maria Lengyel, Guillaume Bour, Guillaume Lample, Gustave Denis, Harizo Rajaona, Himanshu Jaju, Ian Mack, Ian Mathew, Jean-Malo Delignon, Jeremy Facchetti, Jessica Chudnovsky, Joachim Studnia, Justus Murke, Kartik Khandelwal, Kenneth Chiu, Kevin Riera, Leonard Blier, Leonard Suslian, Leonardo Deschaseaux, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Sophia Yang, Margaret Jennings, Marie Pellat, Marie Torelli, Marjorie Janiewicz, Mathis Felardos, Maxime Darrin, Michael Hoff, Mickaël Seznec, Misha Jessel Kenyon, Nayef Derwiche, Nicolas Carmont Zaragoza, Nicolas Faurie, Nicolas Moreau, Nicolas Schuhl, Nikhil Raghuraman, Niklas Muhs, Olivier de Garrigues, Patricia Rozé, Patricia Wang, Patrick von Platen, Paul Jacob, Pauline Buche, Pavankumar Reddy Muddireddy, Perry Savas, Pierre Stock, Pravesh Agrawal, Renaud de Peretti, Romain Sauvestre, Romain Sinthe, Roman Soletskyi, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Soham Ghosh, Sylvain Regnier, Szymon Antoniak, Teven Le Scao, Theophile Gervet, Thibault Schueller, Thibaut Lavril, Thomas Wang, Timothée Lacroix, Valeriia Nemychnikova, Wendy Shang, William El Sayed, William Marshall

3
RenCoder.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b00e89952830a031e58f03505588b3db340935c4ffb9b31cb0ae621da7037275
size 5986568

37
config.json Normal file
View File

@@ -0,0 +1,37 @@
{
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"layer_types": [
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention",
"full_attention", "sliding_attention", "sliding_attention", "sliding_attention"
],
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 100000000.0,
"sliding_window": 32768,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.0.dev0",
"use_cache": true,
"vocab_size": 131072
}

3
consolidated.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c0a9601a697d1220b7122e7aa6682411a8140ab5afde39c786b87513c9db862
size 16039651824

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.46.0.dev0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b58927e79754217608f969cc2229860c95d5701bfd7f147116d2299429ab2e0c
size 4983007904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0bba71b3ca528a442f3c9bdea7e3916ec71837fd772d336ef8e5775bec0d24e0
size 4999836776

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8e53b9596b35b0eb1ea6d8a27b4d6b3e97b3a3179c45bf39c354a05a88e72920
size 4983067960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e75c6183282198d40135032c9e757fb70c2455f8c0f4ded7b0895b49b72bbd5f
size 1073741952

View File

@@ -0,0 +1,334 @@
{
"metadata": {
"total_size": 16039616512
},
"weight_map": {
"lm_head.weight": "model-00004-of-00004.safetensors",
"model.embed_tokens.weight": "model-00001-of-00004.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.input_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"model.norm.weight": "model-00003-of-00004.safetensors"
}
}

13
params.json Normal file
View File

@@ -0,0 +1,13 @@
{
"dim": 4096,
"n_layers": 36,
"head_dim": 128,
"hidden_dim": 12288,
"n_heads": 32,
"n_kv_heads": 8,
"norm_eps": 1e-05,
"vocab_size": 131072,
"rope_theta": 100000000.0,
"sliding_window": [null, 32768, 32768, 32768],
"max_position_embeddings": 131072
}

6
passkey_example.json Normal file

File diff suppressed because one or more lines are too long

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tekken.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eccd1665d2e477697c33cb7f0daa6f6dfefc57a0a6bceb66d4be52952f827516
size 14801223

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7edbeaf20dd7f571b5dd1c54d9ace4f9b6299127cc7ba2afb14a6d51a4a79a4
size 17078136

8015
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff