diff --git a/README.md b/README.md index daecaeb..4933d45 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,69 @@ --- +library_name: transformers +tags: +- 4-bit +- AWQ +- text-generation +- autotrain_compatible +- endpoints_compatible +pipeline_tag: text-generation inference: false +quantized_by: Suparious --- # Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 AWQ -** PROCESSING .... ETA 30mins ** - - Model creator: [Orenguteng](https://huggingface.co/Orenguteng) - Original model: [Llama-3.1-8B-Lexi-Uncensored-V2](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) + + +## How to use + +### Install the necessary packages + +```bash +pip install --upgrade autoawq autoawq-kernels +``` + +### Example Python code + +```python +from awq import AutoAWQForCausalLM +from transformers import AutoTokenizer, TextStreamer + +model_path = "solidrust/Llama-3.1-8B-Lexi-Uncensored-V2-AWQ" +system_message = "You are Llama-3.1-8B-Lexi-Uncensored-V2, incarnated as a powerful AI. You were created by Orenguteng." + +# Load model +model = AutoAWQForCausalLM.from_quantized(model_path, + fuse_layers=True) +tokenizer = AutoTokenizer.from_pretrained(model_path, + trust_remote_code=True) +streamer = TextStreamer(tokenizer, + skip_prompt=True, + skip_special_tokens=True) + +# Convert prompt to tokens +prompt_template = """\ +<|im_start|>system +{system_message}<|im_end|> +<|im_start|>user +{prompt}<|im_end|> +<|im_start|>assistant""" + +prompt = "You're standing on the surface of the Earth. "\ + "You walk one mile south, one mile west and one mile north. "\ + "You end up exactly where you started. Where are you?" + +tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt), + return_tensors='pt').input_ids.cuda() + +# Generate output +generation_output = model.generate(tokens, + streamer=streamer, + max_new_tokens=512) +``` + ### About AWQ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.