--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - Mistral-Nemo-Instruct-2407 --- Quantizations of https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [JanAI](https://github.com/janhq/jan) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [ollama](https://github.com/ollama/ollama) * [GPT4All](https://github.com/nomic-ai/gpt4all) --- # From original readme ## Usage The model can be used with three different frameworks - [`mistral_inference`](https://github.com/mistralai/mistral-inference): See [here](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407#mistral-inference) - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers) - [`NeMo`](https://github.com/NVIDIA/NeMo): See [nvidia/Mistral-NeMo-12B-Instruct](https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct) ### Mistral Inference #### Install It is recommended to use `mistralai/Mistral-Nemo-Instruct-2407` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling. ``` pip install mistral_inference ``` #### Download ```py from huggingface_hub import snapshot_download from pathlib import Path mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-Instruct') mistral_models_path.mkdir(parents=True, exist_ok=True) snapshot_download(repo_id="mistralai/Mistral-Nemo-Instruct-2407", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path) ``` #### Chat After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using ``` mistral-chat $HOME/mistral_models/Nemo-Instruct --instruct --max_tokens 256 --temperature 0.35 ``` *E.g.* Try out something like: ``` How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar. ``` #### Instruct following ```py from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json") model = Transformer.from_folder(mistral_models_path) prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar." completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)]) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.decode(out_tokens[0]) print(result) ``` #### Function calling ```py from mistral_common.protocol.instruct.tool_calls import Function, Tool from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json") model = Transformer.from_folder(mistral_models_path) completion_request = ChatCompletionRequest( tools=[ Tool( function=Function( name="get_current_weather", description="Get the current weather", parameters={ "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location.", }, }, "required": ["location", "format"], }, ) ) ], messages=[ UserMessage(content="What's the weather like today in Paris?"), ], ) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.decode(out_tokens[0]) print(result) ``` ### Transformers > [!IMPORTANT] > NOTE: Until a new release has been made, you need to install transformers from source: > ```sh > pip install git+https://github.com/huggingface/transformers.git > ``` If you want to use Hugging Face `transformers` to generate text, you can do something like this. ```py from transformers import pipeline messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] chatbot = pipeline("text-generation", model="mistralai/Mistral-Nemo-Instruct-2407") chatbot(messages) ``` > [!TIP] > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.