32 lines
735 B
Markdown
32 lines
735 B
Markdown
|
|
---
|
||
|
|
base_model: meta-llama/Llama-3.2-1B-Instruct
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
library_name: transformers
|
||
|
|
license: llama3.2
|
||
|
|
tags:
|
||
|
|
- llama-3
|
||
|
|
- llama
|
||
|
|
- meta
|
||
|
|
- facebook
|
||
|
|
- transformers
|
||
|
|
---
|
||
|
|
|
||
|
|
Quantizing Llama-3.2-1B
|
||
|
|
|
||
|
|
Eric Hartford
|
||
|
|
|
||
|
|
I am creating several quants of Llama-3.1-1B for the purposes of testing vLLM Marlin.
|
||
|
|
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-FP8-Dynamic
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-MXFP4
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-NVFP4A16
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-AWQ
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-GPTQ
|
||
|
|
- https://huggingface.co/QuixiAI/Llama-3.2-1B-W8A16-GPTQ
|
||
|
|
|
||
|
|
The script I used to quant this:
|
||
|
|
[quant.py](quant.py)
|
||
|
|
|