初始化项目，由ModelHub XC社区提供模型

Model: bartowski/ChatQA-1.5-8B-AWQ Source: Original Platform
2026-06-20 02:04:13 +08:00
commit 228f1bb127
11 changed files with 413569 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,93 @@
+---
+license: llama3
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- nvidia
+- chatqa-1.5
+- chatqa
+- llama-3
+- pytorch
+quantized_by: bartowski
+---
+
+## 4-bit GEMM AWQ  Quantizations of ChatQA-1.5-8B
+
+Using <a href="https://github.com/casper-hansen/AutoAWQ/">AutoAWQ</a> release <a href="https://github.com/casper-hansen/AutoAWQ/releases/tag/v0.2.4">v0.2.4</a> for quantization.
+
+Original model: https://huggingface.co/nvidia/ChatQA-1.5-8B
+
+## Prompt format
+
+```
+<|begin_of_text|><|start_header_id|>system<|end_header_id|>
+
+{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
+
+{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+
+
+```
+
+## AWQ Parameters
+
+ - q_group_size: 128
+ - w_bit: 4
+ - zero_point: True
+ - version: GEMM
+
+## How to run
+
+From the AutoAWQ repo [here](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py)
+
+First install autoawq pypi package:
+
+```
+pip install autoawq
+```
+
+Then run the following:
+
+```
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer, TextStreamer
+
+
+quant_path = "models/ChatQA-1.5-8B-AWQ"
+
+# Load model
+model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
+tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
+streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+
+prompt = "You're standing on the surface of the Earth. "\
+        "You walk one mile south, one mile west and one mile north. "\
+        "You end up exactly where you started. Where are you?"
+
+chat = [
+    {"role": "system", "content": "You are a concise assistant that helps answer questions."},
+    {"role": "user", "content": prompt},
+]
+
+# <|eot_id|> used for llama 3 models
+terminators = [
+    tokenizer.eos_token_id,
+    tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+
+tokens = tokenizer.apply_chat_template(
+    chat,
+    return_tensors="pt"
+).cuda()
+
+# Generate output
+generation_output = model.generate(
+    tokens, 
+    streamer=streamer,
+    max_new_tokens=64,
+    eos_token_id=terminators
+)
+```
+
+Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski