--- license: apache-2.0 base_model: veyra-ai/veyra2-30m-base-2b-tokens tags: - gguf - llama.cpp - text-generation - veyra - small-language-model --- # Veyra2 30M Base 2B Tokens GGUF GGUF conversions of [`veyra-ai/veyra2-30m-base-2b-tokens`](https://huggingface.co/veyra-ai/veyra2-30m-base-2b-tokens) for llama.cpp-compatible runtimes. ## Files - `veyra2-30m-base-2b-tokens-f16.gguf` - `veyra2-30m-base-2b-tokens-Q8_0.gguf` - `veyra2-30m-base-2b-tokens-Q6_K.gguf` - `veyra2-30m-base-2b-tokens-Q5_K_M.gguf` - `veyra2-30m-base-2b-tokens-Q4_K_M.gguf` - `veyra2-30m-base-2b-tokens-Q3_K_M.gguf` - `veyra2-30m-base-2b-tokens-Q2_K.gguf` ## Notes These GGUFs were converted with `tokenizer.ggml.add_eos_token=false`, because the original conversion metadata appended EOS to prompts, causing llama.cpp continuation prompts to behave like a new document after the prompt. If you are using something like LM Studio you can set your chat template to ChatML to talk to Veyra. Recommended starting points: - Best quality: `Q8_0` or `f16` - Balanced: `Q5_K_M` or `Q6_K` - Smallest experimental: `Q2_K` / `Q3_K_M` Recommended settings: - Temperature: 0.8 - Repeat Penalty: 1.2 **Note:** For lower quants I would recommend using higher temperature to avoid repetition over repeat penalty.