Model: veyra-ai/veyra2-30m-base-2b-tokens-gguf Source: Original Platform
license, base_model, tags
| license | base_model | tags | |||||
|---|---|---|---|---|---|---|---|
| apache-2.0 | veyra-ai/veyra2-30m-base-2b-tokens |
|
Veyra2 30M Base 2B Tokens GGUF
GGUF conversions of veyra-ai/veyra2-30m-base-2b-tokens for llama.cpp-compatible runtimes.
Files
veyra2-30m-base-2b-tokens-f16.ggufveyra2-30m-base-2b-tokens-Q8_0.ggufveyra2-30m-base-2b-tokens-Q6_K.ggufveyra2-30m-base-2b-tokens-Q5_K_M.ggufveyra2-30m-base-2b-tokens-Q4_K_M.ggufveyra2-30m-base-2b-tokens-Q3_K_M.ggufveyra2-30m-base-2b-tokens-Q2_K.gguf
Notes
These GGUFs were converted with tokenizer.ggml.add_eos_token=false, because the original conversion metadata appended EOS to prompts, causing llama.cpp continuation prompts to behave like a new document after the prompt. If you are using something like LM Studio you can set your chat template to ChatML to talk to Veyra.
Recommended starting points:
- Best quality:
Q8_0orf16 - Balanced:
Q5_K_MorQ6_K - Smallest experimental:
Q2_K/Q3_K_M
Recommended settings:
- Temperature: 0.8
- Repeat Penalty: 1.2
Note: For lower quants I would recommend using higher temperature to avoid repetition over repeat penalty.
Description