初始化项目，由ModelHub XC社区提供模型

Model: LLM-Research/Meta-Llama-3-8B-Instruct-GPTQ Source: Original Platform
2026-05-31 07:19:13 +08:00
commit 44941a0db5
9 changed files with 412751 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,77 @@
+---
+license_name: llama3
+tags:
+- finetuned
+- quantized
+- 4-bit
+- gptq
+- transformers
+- safetensors
+- llama
+- text-generation
+- facebook
+- meta
+- pytorch
+- llama-3
+- conversational
+- en
+- license:other
+- autotrain_compatible
+- endpoints_compatible
+- has_space
+- text-generation-inference
+- region:us
+model_name: Meta-Llama-3-8B-Instruct-GPTQ
+base_model: meta-llama/Meta-Llama-3-8B-Instruct
+inference: false
+model_creator: meta-llama
+pipeline_tag: text-generation
+quantized_by: MaziyarPanahi
+---
+# Description
+[MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ) is a quantized (GPTQ) version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
+
+## How to use
+### Install the necessary packages
+
+```
+pip install --upgrade accelerate auto-gptq transformers
+```
+
+### Example Python code
+
+
+```python
+from transformers import AutoTokenizer, pipeline
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+import torch
+
+model_id = "MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ"
+
+quantize_config = BaseQuantizeConfig(
+        bits=4,
+        group_size=128,
+        desc_act=False
+    )
+
+model = AutoGPTQForCausalLM.from_quantized(
+        model_id,
+        use_safetensors=True,
+        device="cuda:0",
+        quantize_config=quantize_config)
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    repetition_penalty=1.1
+)
+
+outputs = pipe("What is a large language model?")
+print(outputs[0]["generated_text"])
+```