KeyLM-75M-Instruct-GGUF/README.md

---
license: apache-2.0
language:
- en
base_model: Eclipse-Senpai/KeyLM-75M-Instruct
base_model_relation: quantized
pipeline_tag: text-generation
library_name: gguf
tags:
- keylm
- gguf
- llama.cpp
- small-language-model
- instruct
---

# KeyLM-75M-Instruct-GGUF

GGUF builds of [**KeyLM-75M-Instruct**](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for `llama.cpp`, LM Studio, Ollama, and other GGUF runtimes.

KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for benchmarks, training details, limitations, and the `transformers` (safetensors) version.

## Files

| File | Quant | Size | Notes |
|---|---|---|---|
| `KeyLM-75M-Instruct.F16.gguf` | F16 | ~144 MB | Full precision and recommended. The model is already tiny, so there is little reason to quantize further. |

## Run with llama.cpp

```bash
# straight from the Hub
llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv

# or a local file
llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv
```

The chat template (`User:` / `Assistant:`, assistant turns ending with `</s>`) is embedded in the GGUF, so conversation mode (`-cnv`) applies it automatically.

## LM Studio / Ollama

- **LM Studio:** load the `.gguf`; the embedded chat template is detected automatically.
- **Ollama:** `ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF`

## Notes & limitations

KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct).

## License

Apache 2.0.
初始化项目，由ModelHub XC社区提供模型 Model: Eclipse-Senpai/KeyLM-75M-Instruct-GGUF Source: Original Platform 2026-06-19 06:15:17 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`base_model: Eclipse-Senpai/KeyLM-75M-Instruct`
			`base_model_relation: quantized`
			`pipeline_tag: text-generation`
			`library_name: gguf`
			`tags:`
			`- keylm`
			`- gguf`
			`- llama.cpp`
			`- small-language-model`
			`- instruct`
			`---`

			`# KeyLM-75M-Instruct-GGUF`

			GGUF builds of [KeyLM-75M-Instruct](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for `llama.cpp`, LM Studio, Ollama, and other GGUF runtimes.

			KeyLM is a 75M-parameter instruction-tuned language model trained from scratch on approximately 18 billion tokens. See the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct) for benchmarks, training details, limitations, and the `transformers` (safetensors) version.

			`## Files`

			`\| File \| Quant \| Size \| Notes \|`
			`\|---\|---\|---\|---\|`
			\| `KeyLM-75M-Instruct.F16.gguf` \| F16 \| ~144 MB \| Full precision and recommended. The model is already tiny, so there is little reason to quantize further. \|

			`## Run with llama.cpp`

			```bash
			`# straight from the Hub`
			`llama-cli -hf Eclipse-Senpai/KeyLM-75M-Instruct-GGUF -cnv`

			`# or a local file`
			`llama-cli -m KeyLM-75M-Instruct.F16.gguf -cnv`
			```

			The chat template (`User:` / `Assistant:`, assistant turns ending with `</s>`) is embedded in the GGUF, so conversation mode (`-cnv`) applies it automatically.

			`## LM Studio / Ollama`

			- LM Studio: load the `.gguf`; the embedded chat template is detected automatically.
			- Ollama: `ollama run hf.co/Eclipse-Senpai/KeyLM-75M-Instruct-GGUF`

			`## Notes & limitations`

			`KeyLM is a tiny model: good at simple instruction following and short chat, near random chance on knowledge/reasoning benchmarks. It is not a factual assistant. Full numbers and caveats are on the [main model card](https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct).`

			`## License`

			`Apache 2.0.`