初始化项目，由ModelHub XC社区提供模型

Model: huihui-ai/Llama-3.1-8B-Fusion-7030 Source: Original Platform
2026-06-03 03:47:15 +08:00
commit bdfa4a89d6
14 changed files with 413134 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,117 @@
+---
+license: llama3.1
+base_model:
+- meta-llama/Meta-Llama-3.1-8B-Instruct
+tags:
+- Text Generation
+- llama3.1
+- text-generation-inference
+- Inference Endpoints
+- Transformers
+- Fusion
+language:
+- en
+---
+# Llama-3.1-8B-Fusion-7030
+
+## Overview
+`Llama-3.1-8B-Fusion-7030` is a mixed model that combines the strengths of two powerful Llama-based models: [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) and [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated). The weights are blended in a 7:3 ratio, with 70% of the weights from SuperNova-Lite and 30% from the abliterated Meta-Llama-3.1-8B-Instruct model.
+**Although it's a simple mix, the model is usable, and no gibberish has appeared**.
+This is an experiment. I test the [9:1](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-9010), [8:2](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-8020), [7:3](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-7030), [6:4](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-6040)  and [5:5](https://huggingface.co/huihui-ai/Llama-3.1-8B-Fusion-5050)  ratios separately to see how much impact they have on the model.
+All model evaluation reports will be provided subsequently.
+
+## Model Details
+- **Base Models:**
+  - [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) (70%)
+  - [mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated](https://huggingface.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated) (30%)
+- **Model Size:** 8B parameters
+- **Architecture:** Llama 3.1
+- **Mixing Ratio:** 7:3 (SuperNova-Lite:Meta-Llama-3.1-8B-Instruct-abliterated)
+
+## Key Features
+- **SuperNova-Lite Contributions (70%):** Llama-3.1-SuperNova-Lite is an 8B parameter model developed by Arcee.ai, based on the Llama-3.1-8B-Instruct architecture.
+- **Meta-Llama-3.1-8B-Instruct-abliterated Contributions (30%):** This is an uncensored version of Llama 3.1 8B Instruct created with abliteration.
+
+## Usage
+You can use this mixed model in your applications by loading it with Hugging Face's `transformers` library:
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+import time
+
+mixed_model_name = "huihui-ai/Llama-3.1-8B-Fusion-7030"
+
+# Check if CUDA is available
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# Load model and tokenizer
+mixed_model = AutoModelForCausalLM.from_pretrained(mixed_model_name, device_map=device, torch_dtype=torch.bfloat16)
+tokenizer = AutoTokenizer.from_pretrained(mixed_model_name)
+
+# Ensure the tokenizer has pad_token_id set
+tokenizer.pad_token_id = tokenizer.eos_token_id
+
+# Input loop
+print("Start inputting text for inference (type 'exit' to quit)")
+while True:
+    prompt = input("Enter your prompt: ")
+    if prompt.lower() == "exit":
+        print("Exiting inference loop.")
+        break
+
+    # Inference phase: Generate text using the modified model
+    chat = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": prompt}
+    ]
+
+    # Prepare input data
+    input_ids = tokenizer.apply_chat_template(
+        chat, tokenize=True, add_generation_prompt=True, return_tensors="pt"
+    ).to(device)
+
+    # Use TextStreamer for streaming output
+    streamer = TextStreamer(tokenizer, skip_special_tokens=True)
+
+    # Record the start time
+    start_time = time.time()
+
+    # Generate text and stream output character by character
+    outputs = mixed_model.generate(
+        input_ids,
+        max_new_tokens=8192,
+        do_sample=True,
+        temperature=0.6,
+        top_p=0.9,
+        streamer=streamer  # Enable streaming output
+    )
+
+    # Record the end time
+    end_time = time.time()
+
+    # Calculate the number of generated tokens
+    generated_tokens = outputs[0][input_ids.shape[-1]:].shape[0]
+
+    # Calculate the total time taken
+    total_time = end_time - start_time
+
+    # Calculate tokens generated per second
+    tokens_per_second = generated_tokens / total_time
+
+    print(f"\nGenerated {generated_tokens} tokens in total, took {total_time:.2f} seconds, generating {tokens_per_second:.2f} tokens per second.")
+
+```
+
+## Evaluations
+
+The following data has been re-evaluated and calculated as the average for each test.
+| Benchmark   | SuperNova-Lite | Meta-Llama-3.1-8B-Instruct-abliterated | Llama-3.1-8B-Fusion-9010 | Llama-3.1-8B-Fusion-8020 | Llama-3.1-8B-Fusion-7030 | Llama-3.1-8B-Fusion-6040 | Llama-3.1-8B-Fusion-5050 | 
+|-------------|----------------|----------------------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 
+| IF_Eval     | 82.09          | 76.29                                  | 82.44                    | 82.93                    | **83.10**                 | 82.94                    | 82.03                    | 
+| MMLU Pro    | **35.87**      | 33.1                                   | 35.65                    | 35.32                    | 34.91                    | 34.5                     | 33.96                    | 
+| TruthfulQA  | **64.35**      | 53.25                                  | 62.67                    | 61.04                    | 59.09                    | 57.8                     | 56.75                    | 
+| BBH         | **49.48**      | 44.87                                  | 48.86                    | 48.47                    | 48.30                    | 48.19                    | 47.93                    | 
+| GPQA        | 31.98          | 29.50                                  | 32.25                    | 32.38                    | **32.61**                 | 31.14                    | 30.6                     | 
+
+The script used for evaluation can be found inside this repository under /eval.sh, or click [here](https://huggingface.co/huihui-ai/Qwen2.5-7B-Instruct-abliterated/blob/main/eval.sh)