初始化项目,由ModelHub XC社区提供模型
Model: LLM-Research/Meta-Llama-3-8B-Instruct-GPTQ Source: Original Platform
This commit is contained in:
77
README.md
Normal file
77
README.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
license_name: llama3
|
||||
tags:
|
||||
- finetuned
|
||||
- quantized
|
||||
- 4-bit
|
||||
- gptq
|
||||
- transformers
|
||||
- safetensors
|
||||
- llama
|
||||
- text-generation
|
||||
- facebook
|
||||
- meta
|
||||
- pytorch
|
||||
- llama-3
|
||||
- conversational
|
||||
- en
|
||||
- license:other
|
||||
- autotrain_compatible
|
||||
- endpoints_compatible
|
||||
- has_space
|
||||
- text-generation-inference
|
||||
- region:us
|
||||
model_name: Meta-Llama-3-8B-Instruct-GPTQ
|
||||
base_model: meta-llama/Meta-Llama-3-8B-Instruct
|
||||
inference: false
|
||||
model_creator: meta-llama
|
||||
pipeline_tag: text-generation
|
||||
quantized_by: MaziyarPanahi
|
||||
---
|
||||
# Description
|
||||
[MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ) is a quantized (GPTQ) version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
|
||||
|
||||
## How to use
|
||||
### Install the necessary packages
|
||||
|
||||
```
|
||||
pip install --upgrade accelerate auto-gptq transformers
|
||||
```
|
||||
|
||||
### Example Python code
|
||||
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, pipeline
|
||||
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
||||
import torch
|
||||
|
||||
model_id = "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ"
|
||||
|
||||
quantize_config = BaseQuantizeConfig(
|
||||
bits=4,
|
||||
group_size=128,
|
||||
desc_act=False
|
||||
)
|
||||
|
||||
model = AutoGPTQForCausalLM.from_quantized(
|
||||
model_id,
|
||||
use_safetensors=True,
|
||||
device="cuda:0",
|
||||
quantize_config=quantize_config)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
|
||||
pipe = pipeline(
|
||||
"text-generation",
|
||||
model=model,
|
||||
tokenizer=tokenizer,
|
||||
max_new_tokens=512,
|
||||
temperature=0.7,
|
||||
top_p=0.95,
|
||||
repetition_penalty=1.1
|
||||
)
|
||||
|
||||
outputs = pipe("What is a large language model?")
|
||||
print(outputs[0]["generated_text"])
|
||||
```
|
||||
Reference in New Issue
Block a user