初始化项目,由ModelHub XC社区提供模型

Model: QuantFactory/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-17 23:38:13 +08:00
commit 01838dd998
17 changed files with 429 additions and 0 deletions

49
.gitattributes vendored Normal file
View File

@@ -0,0 +1,49 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bf8b3c8b13b6bc5edf622c2d856825e3acb4de8c8719da77cd372deef444314a
size 3179136544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a8f1353f83ea09b8cb73346f6ccb6960ac461838353e3853000c231446b3e817
size 4321961504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63eb8ba396fe934886b3f5d710fc0dfa114f35f272cf2283fe8b347e33a2b2ad
size 4018923040

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b8a2524cddf63fb714c8f1e61a42b3b619932926a73d290e7a0c99cc4868b92
size 3664504352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2ec2301be9d1978625fe1e41269763760337172c86053553683123311eb76406
size 4661216800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:11d1404eecde481fd918c0fa11430c7f557afbee01d6ea561c93f46ca3bc843b
size 5130257952

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4934b7ddedc71cd95c919a9032ee62d2f4d7a0f26cc52e9d29c234048358363f
size 4920739360

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a6a563d9a119f9257aafc8916a710d780ee70fd032d125937a3dacc8d87165b0
size 4692674080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5ea525b858983393de2301d0f7dc98177983920c84c4dd4c17c1e480277e3169
size 5599299104

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:46bb0e7e2aa50ac4968f246d9eda0d7f677fedcaeefd6261eb4771e0aab15d4e
size 6068340256

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e2624d5900859cec79e4f5a40a2b6b17011bf1601535778849336de0f9a7bc80
size 5732992544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:43f10c2e15c15fe194b5cc51b557033562364c0fc1f86df44c384ff5820b8f36
size 5599299104

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0ac85b35956d0e356ccf7fbefb14373d90020aa2a8df855d74bf96cb4eaeb0c6
size 6596011552

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d1b17b310e7849ef8e4b08dfdf08c67c50bec2c361916f15981190c5785cce9
size 8540775968

337
README.md Normal file
View File

@@ -0,0 +1,337 @@
---
base_model: unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
datasets:
- argilla/distilabel-intel-orca-kto
---
[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
# QuantFactory/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta-GGUF
This is quantized version of [EpistemeAI2/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta](https://huggingface.co/EpistemeAI2/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta) created using llama.cpp
# Original Model Card
# Research
**Using unsloth new KTO fine tuning** to fine tune distilabel-intel-orca-kto dataset
# Original Model card:
## Model Information
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
**Model developer**: Meta
**Model Architecture:** Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
<table>
<tr>
<td>
</td>
<td><strong>Training Data</strong>
</td>
<td><strong>Params</strong>
</td>
<td><strong>Input modalities</strong>
</td>
<td><strong>Output modalities</strong>
</td>
<td><strong>Context length</strong>
</td>
<td><strong>GQA</strong>
</td>
<td><strong>Token count</strong>
</td>
<td><strong>Knowledge cutoff</strong>
</td>
</tr>
<tr>
<td rowspan="3" >Llama 3.1 (text only)
</td>
<td rowspan="3" >A new mix of publicly available online data.
</td>
<td>8B
</td>
<td>Multilingual Text
</td>
<td>Multilingual Text and code
</td>
<td>128k
</td>
<td>Yes
</td>
<td rowspan="3" >15T+
</td>
<td rowspan="3" >December 2023
</td>
</tr>
<tr>
<td>70B
</td>
<td>Multilingual Text
</td>
<td>Multilingual Text and code
</td>
<td>128k
</td>
<td>Yes
</td>
</tr>
<tr>
<td>405B
</td>
<td>Multilingual Text
</td>
<td>Multilingual Text and code
</td>
<td>128k
</td>
<td>Yes
</td>
</tr>
</table>
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
**Llama 3.1 family of models**. Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.
**Model Release Date:** July 23, 2024.
**Status:** This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback.
**License:** A custom commercial license, the Llama 3.1 Community License, is available at: [https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3.1 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
## Intended Use
**Intended Use Cases** Llama 3.1 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks. The Llama 3.1 model collection also supports the ability to leverage the outputs of its models to improve other models including synthetic data generation and distillation. The Llama 3.1 Community License allows for these use cases.
**Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.1 Community License. Use in languages beyond those explicitly referenced as supported in this model card**.
**<span style="text-decoration:underline;">Note</span>: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages. Developers may fine-tune Llama 3.1 models for languages beyond the 8 supported languages provided they comply with the Llama 3.1 Community License and the Acceptable Use Policy and in such cases are responsible for ensuring that any uses of Llama 3.1 in additional languages is done in a safe and responsible manner.
## How to use
This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original `llama` codebase.
### Use with transformers
Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
Make sure to update your transformers installation via `pip install --upgrade transformers`.
```python
import transformers
import torch
model_id = "EpistemeAI2/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
```
Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
### Tool use with transformers
LLaMA-3.1 supports multiple tool use formats. You can see a full guide to prompt formatting [here](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/).
Tool use is also supported through [chat templates](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling) in Transformers.
Here is a quick example showing a single simple tool:
```python
# First, define a tool
def get_current_temperature(location: str) -> float:
"""
Get the current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, Country"
Returns:
The current temperature at the specified location in the specified units, as a float.
"""
return 22. # A real function should probably actually get the temperature!
# Next, create a chat and apply the chat template
messages = [
{"role": "system", "content": "You are a bot that responds to weather queries."},
{"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]
inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)
```
You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so:
```python
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
```
and then call the tool and append the result, with the `tool` role, like so:
```python
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
```
After that, you can `generate()` again to let the model use the tool result in the chat. Note that this was a very brief introduction to tool calling - for more information,
see the [LLaMA prompt format docs](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/) and the Transformers [tool use documentation](https://huggingface.co/docs/transformers/main/chat_templating#advanced-tool-use--function-calling).
#### vllm
```
# Install vLLM from pip:
pip install vllm
Load and run the model:
# Load and run the model:
vllm serve "EpistemeAI2/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta"
# Call the server using curl:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "EpistemeAI2/Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta"
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
```
## Hardware and Software
**Training Factors** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, annotation, and evaluation were also performed on production infrastructure.
**Training utilized a cumulative of** 39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency.
**Training Greenhouse Gas Emissions** Estimated total location-based greenhouse gas emissions were **11,390** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy, therefore the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
<table>
<tr>
<td>
</td>
<td><strong>Training Time (GPU hours)</strong>
</td>
<td><strong>Training Power Consumption (W)</strong>
</td>
<td><strong>Training Location-Based Greenhouse Gas Emissions</strong>
<p>
<strong>(tons CO2eq)</strong>
</td>
<td><strong>Training Market-Based Greenhouse Gas Emissions</strong>
<p>
<strong>(tons CO2eq)</strong>
</td>
</tr>
<tr>
<td>Llama 3.1 8B
</td>
<td>1.46M
</td>
<td>700
</td>
<td>420
</td>
<td>0
</td>
</tr>
<tr>
<td>Llama 3.1 70B
</td>
<td>7.0M
</td>
<td>700
</td>
<td>2,040
</td>
<td>0
</td>
</tr>
<tr>
<td>Llama 3.1 405B
</td>
<td>30.84M
</td>
<td>700
</td>
<td>8,930
</td>
<td>0
</td>
</tr>
<tr>
<td>Total
</td>
<td>39.3M
<td>
<ul>
</ul>
</td>
<td>11,390
</td>
<td>0
</td>
</tr>
</table>
The methodology used to determine training energy use and greenhouse gas emissions can be found [here](https://arxiv.org/pdf/2204.05149). Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others.
## Training Data
**Overview:** Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.
**Data Freshness:** The pretraining data has a cutoff of December 2023.
# Uploaded model
- **Developed by:** EpistemeAI2
- **License:** apache-2.0
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
## 4. Citation
```
@misc{Fireball-Alpaca-Llama-3.1-8B-Instruct-KTO-beta,
title={Fireball series: Reasonable Large Language Model},
author={EpistemeAI,
year={2024},
}
```

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "others", "allow_remote": true}