170 lines
5.5 KiB
Markdown
170 lines
5.5 KiB
Markdown
---
|
|
license: other
|
|
language:
|
|
- en
|
|
pipeline_tag: text-generation
|
|
inference: false
|
|
tags:
|
|
- transformers
|
|
- gguf
|
|
- imatrix
|
|
- Mistral-Nemo-Instruct-2407
|
|
---
|
|
Quantizations of https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
|
|
|
|
|
|
### Inference Clients/UIs
|
|
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
|
* [JanAI](https://github.com/janhq/jan)
|
|
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
|
|
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
|
* [ollama](https://github.com/ollama/ollama)
|
|
* [GPT4All](https://github.com/nomic-ai/gpt4all)
|
|
|
|
---
|
|
|
|
# From original readme
|
|
|
|
## Usage
|
|
|
|
The model can be used with three different frameworks
|
|
|
|
- [`mistral_inference`](https://github.com/mistralai/mistral-inference): See [here](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407#mistral-inference)
|
|
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
|
|
- [`NeMo`](https://github.com/NVIDIA/NeMo): See [nvidia/Mistral-NeMo-12B-Instruct](https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct)
|
|
|
|
### Mistral Inference
|
|
|
|
#### Install
|
|
|
|
It is recommended to use `mistralai/Mistral-Nemo-Instruct-2407` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
|
|
|
|
```
|
|
pip install mistral_inference
|
|
```
|
|
|
|
#### Download
|
|
|
|
```py
|
|
from huggingface_hub import snapshot_download
|
|
from pathlib import Path
|
|
|
|
mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-Instruct')
|
|
mistral_models_path.mkdir(parents=True, exist_ok=True)
|
|
|
|
snapshot_download(repo_id="mistralai/Mistral-Nemo-Instruct-2407", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
|
|
```
|
|
|
|
#### Chat
|
|
|
|
After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
|
|
|
|
```
|
|
mistral-chat $HOME/mistral_models/Nemo-Instruct --instruct --max_tokens 256 --temperature 0.35
|
|
```
|
|
|
|
*E.g.* Try out something like:
|
|
```
|
|
How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar.
|
|
```
|
|
|
|
#### Instruct following
|
|
|
|
```py
|
|
from mistral_inference.transformer import Transformer
|
|
from mistral_inference.generate import generate
|
|
|
|
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
|
|
from mistral_common.protocol.instruct.messages import UserMessage
|
|
from mistral_common.protocol.instruct.request import ChatCompletionRequest
|
|
|
|
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
|
|
model = Transformer.from_folder(mistral_models_path)
|
|
|
|
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
|
|
|
|
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
|
|
|
|
tokens = tokenizer.encode_chat_completion(completion_request).tokens
|
|
|
|
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
|
|
result = tokenizer.decode(out_tokens[0])
|
|
|
|
print(result)
|
|
```
|
|
|
|
#### Function calling
|
|
|
|
```py
|
|
from mistral_common.protocol.instruct.tool_calls import Function, Tool
|
|
from mistral_inference.transformer import Transformer
|
|
from mistral_inference.generate import generate
|
|
|
|
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
|
|
from mistral_common.protocol.instruct.messages import UserMessage
|
|
from mistral_common.protocol.instruct.request import ChatCompletionRequest
|
|
|
|
|
|
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
|
|
model = Transformer.from_folder(mistral_models_path)
|
|
|
|
completion_request = ChatCompletionRequest(
|
|
tools=[
|
|
Tool(
|
|
function=Function(
|
|
name="get_current_weather",
|
|
description="Get the current weather",
|
|
parameters={
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA",
|
|
},
|
|
"format": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
"description": "The temperature unit to use. Infer this from the users location.",
|
|
},
|
|
},
|
|
"required": ["location", "format"],
|
|
},
|
|
)
|
|
)
|
|
],
|
|
messages=[
|
|
UserMessage(content="What's the weather like today in Paris?"),
|
|
],
|
|
)
|
|
|
|
tokens = tokenizer.encode_chat_completion(completion_request).tokens
|
|
|
|
out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
|
|
result = tokenizer.decode(out_tokens[0])
|
|
|
|
print(result)
|
|
```
|
|
|
|
### Transformers
|
|
|
|
> [!IMPORTANT]
|
|
> NOTE: Until a new release has been made, you need to install transformers from source:
|
|
> ```sh
|
|
> pip install git+https://github.com/huggingface/transformers.git
|
|
> ```
|
|
|
|
If you want to use Hugging Face `transformers` to generate text, you can do something like this.
|
|
|
|
```py
|
|
from transformers import pipeline
|
|
|
|
messages = [
|
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
|
|
{"role": "user", "content": "Who are you?"},
|
|
]
|
|
chatbot = pipeline("text-generation", model="mistralai/Mistral-Nemo-Instruct-2407")
|
|
chatbot(messages)
|
|
```
|
|
|
|
> [!TIP]
|
|
> Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3. |