AWQ quants

2024-06-06 17:04:30 +00:00
parent 09c61257af
commit 0c267a612a
9 changed files with 413530 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,92 @@
+---
+license: cc-by-nc-4.0
+language:
+- en
+datasets:
+- Gryphe/Opus-WritingPrompts
+- Sao10K/Claude-3-Opus-Instruct-15K
+- Sao10K/Short-Storygen-v2
+- Sao10K/c2-Logs-Filtered
+quantized_by: bartowski
+pipeline_tag: text-generation
+---
+
+## 4-bit GEMM AWQ  Quantizations of L3-8B-Stheno-v3.2
+
+Using <a href="https://github.com/casper-hansen/AutoAWQ/">AutoAWQ</a> release <a href="https://github.com/casper-hansen/AutoAWQ/releases/tag/v0.2.5">v0.2.5</a> for quantization.
+
+Original model: https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2
+
+## Prompt format
+
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+
+{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
+
+{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+
+
+```
+
+## AWQ Parameters
+
+ - q_group_size: 128
+ - w_bit: 4
+ - zero_point: True
+ - version: GEMM
+
+## How to run
+
+From the AutoAWQ repo [here](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py)
+
+First install autoawq pypi package:
+
+```
+pip install autoawq
+```
+
+Then run the following:
+
+```
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer, TextStreamer
+
+
+quant_path = "models/L3-8B-Stheno-v3.2-AWQ"
+
+# Load model
+model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
+tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
+streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+
+prompt = "You're standing on the surface of the Earth. "\
+        "You walk one mile south, one mile west and one mile north. "\
+        "You end up exactly where you started. Where are you?"
+
+chat = [
+    {"role": "system", "content": "You are a concise assistant that helps answer questions."},
+    {"role": "user", "content": prompt},
+]
+
+# <|eot_id|> used for llama 3 models
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+
+tokens = tokenizer.apply_chat_template(
+    chat,
+    return_tensors="pt"
+).cuda()
+
+# Generate output
+generation_output = model.generate(
+    tokens, 
+    streamer=streamer,
+    max_new_tokens=64,
+    eos_token_id=terminators
+)
+```
+
+Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski