Files
gemma-2b-GGUF/README.md
ModelHub XC 4c821857dd 初始化项目,由ModelHub XC社区提供模型
Model: hoangton/gemma-2b-GGUF
Source: Original Platform
2026-05-05 13:42:21 +08:00

2.2 KiB
Raw Permalink Blame History

library_name, tags, extra_gated_heading, extra_gated_prompt, extra_gated_button_content, license, license_name, license_link
library_name tags extra_gated_heading extra_gated_prompt extra_gated_button_content license license_name license_link
transformers
Access Gemma on Hugging Face To access Gemma on Hugging Face, youre required to review and agree to Googles usage license. To do this, please ensure youre logged-in to Hugging Face and click below. Requests are processed immediately. Acknowledge license other gemma-terms-of-use https://ai.google.dev/gemma/terms

Gemma-2B GGUF

This is a quantized version of the google/gemma-2b model using llama.cpp.

This model card corresponds to the 2B base version of the Gemma model. You can also visit the model card of the 7B base model, 7B instruct model, and 2B instruct model.

Model Page: Gemma

Terms of Use: Terms

Quants

  • q2_k: Uses Q4_K for the attention.vw and feed_forward.w2 tensors, Q2_K for the other tensors.
  • q3_k_l: Uses Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
  • q3_k_m: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K
  • q3_k_s: Uses Q3_K for all tensors
  • q4_0: Original quant method, 4-bit.
  • q4_1: Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
  • q4_k_m: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K
  • q4_k_s: Uses Q4_K for all tensors
  • q5_0: Higher accuracy, higher resource usage and slower inference.
  • q5_1: Even higher accuracy, resource usage and slower inference.
  • q5_k_m: Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K
  • q5_k_s: Uses Q5_K for all tensors
  • q6_k: Uses Q8_K for all tensors
  • q8_0: Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.

💻 Usage

This model can be used with the latest version of llama.cpp and LM Studio >0.2.16.