初始化项目,由ModelHub XC社区提供模型
Model: bartowski/L3-8B-Stheno-v3.2-AWQ Source: Original Platform
This commit is contained in:
92
README.md
Normal file
92
README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
---
|
||||
license: cc-by-nc-4.0
|
||||
language:
|
||||
- en
|
||||
datasets:
|
||||
- Gryphe/Opus-WritingPrompts
|
||||
- Sao10K/Claude-3-Opus-Instruct-15K
|
||||
- Sao10K/Short-Storygen-v2
|
||||
- Sao10K/c2-Logs-Filtered
|
||||
quantized_by: bartowski
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
## 4-bit GEMM AWQ Quantizations of L3-8B-Stheno-v3.2
|
||||
|
||||
Using <a href="https://github.com/casper-hansen/AutoAWQ/">AutoAWQ</a> release <a href="https://github.com/casper-hansen/AutoAWQ/releases/tag/v0.2.5">v0.2.5</a> for quantization.
|
||||
|
||||
Original model: https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2
|
||||
|
||||
## Prompt format
|
||||
|
||||
```
|
||||
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
|
||||
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
|
||||
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
|
||||
|
||||
```
|
||||
|
||||
## AWQ Parameters
|
||||
|
||||
- q_group_size: 128
|
||||
- w_bit: 4
|
||||
- zero_point: True
|
||||
- version: GEMM
|
||||
|
||||
## How to run
|
||||
|
||||
From the AutoAWQ repo [here](https://github.com/casper-hansen/AutoAWQ/blob/main/examples/generate.py)
|
||||
|
||||
First install autoawq pypi package:
|
||||
|
||||
```
|
||||
pip install autoawq
|
||||
```
|
||||
|
||||
Then run the following:
|
||||
|
||||
```
|
||||
from awq import AutoAWQForCausalLM
|
||||
from transformers import AutoTokenizer, TextStreamer
|
||||
|
||||
|
||||
quant_path = "models/L3-8B-Stheno-v3.2-AWQ"
|
||||
|
||||
# Load model
|
||||
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
|
||||
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
|
||||
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
||||
|
||||
prompt = "You're standing on the surface of the Earth. "\
|
||||
"You walk one mile south, one mile west and one mile north. "\
|
||||
"You end up exactly where you started. Where are you?"
|
||||
|
||||
chat = [
|
||||
{"role": "system", "content": "You are a concise assistant that helps answer questions."},
|
||||
{"role": "user", "content": prompt},
|
||||
]
|
||||
|
||||
# <|eot_id|> used for llama 3 models
|
||||
terminators = [
|
||||
tokenizer.eos_token_id,
|
||||
tokenizer.convert_tokens_to_ids("<|eot_id|>")
|
||||
]
|
||||
|
||||
tokens = tokenizer.apply_chat_template(
|
||||
chat,
|
||||
return_tensors="pt"
|
||||
).cuda()
|
||||
|
||||
# Generate output
|
||||
generation_output = model.generate(
|
||||
tokens,
|
||||
streamer=streamer,
|
||||
max_new_tokens=64,
|
||||
eos_token_id=terminators
|
||||
)
|
||||
```
|
||||
|
||||
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
|
||||
Reference in New Issue
Block a user