初始化项目,由ModelHub XC社区提供模型
Model: huihui-ai/Llama-3.1-8B-Fusion-7030 Source: Original Platform
This commit is contained in:
117
README.md
Normal file
117
README.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
license: llama3.1
|
||||
base_model:
|
||||
- meta-llama/Meta-Llama-3.1-8B-Instruct
|
||||
tags:
|
||||
- Text Generation
|
||||
- llama3.1
|
||||
- text-generation-inference
|
||||
- Inference Endpoints
|
||||
- Transformers
|
||||
- Fusion
|
||||
language:
|
||||
- en
|
||||
---
|
||||
# Llama-3.1-8B-Fusion-7030
|
||||
|
||||
## Overview
|
||||
`Llama-3.1-8B-Fusion-7030` is a mixed model that combines the strengths of two powerful Llama-based models: [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) and [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated). The weights are blended in a 7:3 ratio, with 70% of the weights from SuperNova-Lite and 30% from the abliterated Meta-Llama-3.1-8B-Instruct model.
|
||||
**Although it's a simple mix, the model is usable, and no gibberish has appeared**.
|
||||
This is an experiment. I test the [9:1](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-9010), [8:2](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-8020), [7:3](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-7030), [6:4](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-6040) and [5:5](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-5050) ratios separately to see how much impact they have on the model.
|
||||
All model evaluation reports will be provided subsequently.
|
||||
|
||||
## Model Details
|
||||
- **Base Models:**
|
||||
- [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) (70%)
|
||||
- [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) (30%)
|
||||
- **Model Size:** 8B parameters
|
||||
- **Architecture:** Llama 3.1
|
||||
- **Mixing Ratio:** 7:3 (SuperNova-Lite:Meta-Llama-3.1-8B-Instruct-abliterated)
|
||||
|
||||
## Key Features
|
||||
- **SuperNova-Lite Contributions (70%):** Llama-3.1-SuperNova-Lite is an 8B parameter model developed by Arcee.ai, based on the Llama-3.1-8B-Instruct architecture.
|
||||
- **Meta-Llama-3.1-8B-Instruct-abliterated Contributions (30%):** This is an uncensored version of Llama 3.1 8B Instruct created with abliteration.
|
||||
|
||||
## Usage
|
||||
You can use this mixed model in your applications by loading it with Hugging Face's `transformers` library:
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
||||
import time
|
||||
|
||||
mixed_model_name = "huihui-ai/Llama-3.1-8B-Fusion-7030"
|
||||
|
||||
# Check if CUDA is available
|
||||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
# Load model and tokenizer
|
||||
mixed_model = AutoModelForCausalLM.from_pretrained(mixed_model_name, device_map=device, torch_dtype=torch.bfloat16)
|
||||
tokenizer = AutoTokenizer.from_pretrained(mixed_model_name)
|
||||
|
||||
# Ensure the tokenizer has pad_token_id set
|
||||
tokenizer.pad_token_id = tokenizer.eos_token_id
|
||||
|
||||
# Input loop
|
||||
print("Start inputting text for inference (type 'exit' to quit)")
|
||||
while True:
|
||||
prompt = input("Enter your prompt: ")
|
||||
if prompt.lower() == "exit":
|
||||
print("Exiting inference loop.")
|
||||
break
|
||||
|
||||
# Inference phase: Generate text using the modified model
|
||||
chat = [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
|
||||
# Prepare input data
|
||||
input_ids = tokenizer.apply_chat_template(
|
||||
chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
|
||||
).to(device)
|
||||
|
||||
# Use TextStreamer for streaming output
|
||||
streamer = TextStreamer(tokenizer, skip_special_tokens=True)
|
||||
|
||||
# Record the start time
|
||||
start_time = time.time()
|
||||
|
||||
# Generate text and stream output character by character
|
||||
outputs = mixed_model.generate(
|
||||
input_ids,
|
||||
max_new_tokens=8192,
|
||||
do_sample=True,
|
||||
temperature=0.6,
|
||||
top_p=0.9,
|
||||
streamer=streamer # Enable streaming output
|
||||
)
|
||||
|
||||
# Record the end time
|
||||
end_time = time.time()
|
||||
|
||||
# Calculate the number of generated tokens
|
||||
generated_tokens = outputs[0][input_ids.shape[-1]:].shape[0]
|
||||
|
||||
# Calculate the total time taken
|
||||
total_time = end_time - start_time
|
||||
|
||||
# Calculate tokens generated per second
|
||||
tokens_per_second = generated_tokens / total_time
|
||||
|
||||
print(f"\nGenerated {generated_tokens} tokens in total, took {total_time:.2f} seconds, generating {tokens_per_second:.2f} tokens per second.")
|
||||
|
||||
```
|
||||
|
||||
## Evaluations
|
||||
|
||||
The following data has been re-evaluated and calculated as the average for each test.
|
||||
| Benchmark | SuperNova-Lite | Meta-Llama-3.1-8B-Instruct-abliterated | Llama-3.1-8B-Fusion-9010 | Llama-3.1-8B-Fusion-8020 | Llama-3.1-8B-Fusion-7030 | Llama-3.1-8B-Fusion-6040 | Llama-3.1-8B-Fusion-5050 |
|
||||
|-------------|----------------|----------------------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
|
||||
| IF_Eval | 82.09 | 76.29 | 82.44 | 82.93 | **83.10** | 82.94 | 82.03 |
|
||||
| MMLU Pro | **35.87** | 33.1 | 35.65 | 35.32 | 34.91 | 34.5 | 33.96 |
|
||||
| TruthfulQA | **64.35** | 53.25 | 62.67 | 61.04 | 59.09 | 57.8 | 56.75 |
|
||||
| BBH | **49.48** | 44.87 | 48.86 | 48.47 | 48.30 | 48.19 | 47.93 |
|
||||
| GPQA | 31.98 | 29.50 | 32.25 | 32.38 | **32.61** | 31.14 | 30.6 |
|
||||
|
||||
The script used for evaluation can be found inside this repository under /eval.sh, or click [here](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated/blob/main/eval.sh)
|
||||
Reference in New Issue
Block a user