license, base_model, tags
license base_model tags
apache-2.0 veyra-ai/veyra2-30m-base-2b-tokens
gguf
llama.cpp
text-generation
veyra
small-language-model

Veyra2 30M Base 2B Tokens GGUF

GGUF conversions of veyra-ai/veyra2-30m-base-2b-tokens for llama.cpp-compatible runtimes.

Files

  • veyra2-30m-base-2b-tokens-f16.gguf
  • veyra2-30m-base-2b-tokens-Q8_0.gguf
  • veyra2-30m-base-2b-tokens-Q6_K.gguf
  • veyra2-30m-base-2b-tokens-Q5_K_M.gguf
  • veyra2-30m-base-2b-tokens-Q4_K_M.gguf
  • veyra2-30m-base-2b-tokens-Q3_K_M.gguf
  • veyra2-30m-base-2b-tokens-Q2_K.gguf

Notes

These GGUFs were converted with tokenizer.ggml.add_eos_token=false, because the original conversion metadata appended EOS to prompts, causing llama.cpp continuation prompts to behave like a new document after the prompt. If you are using something like LM Studio you can set your chat template to ChatML to talk to Veyra.

Recommended starting points:

  • Best quality: Q8_0 or f16
  • Balanced: Q5_K_M or Q6_K
  • Smallest experimental: Q2_K / Q3_K_M

Recommended settings:

  • Temperature: 0.8
  • Repeat Penalty: 1.2

Note: For lower quants I would recommend using higher temperature to avoid repetition over repeat penalty.

Description
Model synced from source: veyra-ai/veyra2-30m-base-2b-tokens-gguf
Readme 25 KiB