初始化项目,由ModelHub XC社区提供模型

Model: amazon/MistralLite-AWQ
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-29 11:57:12 +08:00
commit 7ae094c52d
9 changed files with 357 additions and 0 deletions

50
.gitattributes vendored Normal file
View File

@@ -0,0 +1,50 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
model.safetensors filter=lfs diff=lfs merge=lfs -text

150
README.md Normal file
View File

@@ -0,0 +1,150 @@
---
license: apache-2.0
inference: false
---
# MistralLite-AWQ Model
MistralLite-AWQ is a version of the [MistralLite](https://huggingface.co/amazon/MistralLite) model that was
quantized using the AWQ method developed by [Lin et al. (2023)](https://arxiv.org/abs/2306.00978).
The MistralLite-AWQ models are approximately **70% smaller** than those of MistralLite whilst maintaining comparable performance.
Please refer to the [original MistralLite model card](https://huggingface.co/amazon/MistralLite) for details about the model
preparation and training processes.
## MistralLite-AWQ Variants
| Branch | Approx. Model Size | `q_group_size` | `w_bit` | `version` |
|--------|---:|---------------:|--------:|-----------|
| [main](https://huggingface.co/amazon/MistralLite-AWQ/tree/main) | 3.9 GB | 128 | 4 | GEMM |
| [MistralLite-AWQ-64g-4b-GEMM](https://huggingface.co/amazon/MistralLite-AWQ/tree/MistralLite-AWQ-64g-4b-GEMM) | 4.0 GB | 64 | 4 | GEMM |
| [MistralLite-AWQ-32g-4b-GEMM](https://huggingface.co/amazon/MistralLite-AWQ/tree/MistralLite-AWQ-32g-4b-GEMM) | 4.3 GB | 32 | 4 | GEMM |
## Dependencies
- [`autoawq==0.2.5`](https://pypi.org/project/autoawq/0.2.5/) [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) was used to quantize the MistralLite model.
- [`vllm==0.4.2`](https://pypi.org/project/vllm/0.4.2/) [vLLM](https://github.com/vllm-project/vllm) was used to host models for benchmarking.
## Evaluations
### Long Context
The following benchmark results are shown as _accuracy_ (%) values, unless stated otherwise.
#### Topic Retrieval
See https://lmsys.org/blog/2023-06-29-longchat/
| Model Name | n_topics=05 | n_topics=10 | n_topics=15 | n_topics=20 | n_topics=25 |
|:---------------------------------------------------|--------------:|--------------:|--------------:|--------------:|--------------:|
| _n_tokens_ (approx.) = | _3048_ | _5966_ | _8903_ | _11832_ | _14757_ |
| MistralLite | 100 | 100 | 100 | 100 | 98 |
| **MistralLite-AWQ** | **100** | **100** | **100**| **100** | **98** |
| **MistralLite-AWQ-64g-4b-GEMM** | **100** | **100** | **100**| **100** | **98** |
| **MistralLite-AWQ-32g-4b-GEMM** | **100** | **100** | **100**| **100** | **98** |
| Mistral-7B-Instruct-v0.1 | 96 | 52 | 2 | 0 | 0 |
| Mistral-7B-Instruct-v0.2 | 100 | 100 | 100 | 100 | 100 |
| Mixtral-8x7B-v0.1 | 0 | 0 | 0 | 0 | 0 |
| Mixtral-8x7B-Instruct-v0.1 | 100 | 100 | 100 | 100 | 100 |
#### [Line Retrieval](https://lmsys.org/blog/2023-06-29-longchat/#longeval-results)
See https://lmsys.org/blog/2023-06-29-longchat/#longeval-results
| Model Name | n_lines=200 | n_lines=300 | n_lines=400 | n_lines=500 | n_lines=600 | n_lines=680 |
|:----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
| _n_tokens_ (approx.) = | _4317_ | _6415_ | _8510_ | _10610_ | _12698_ | _14373_ |
| MistralLite | 100 | 94 | 86 | 82 | 76 | 66 |
| **MistralLite-AWQ** | **96**| **94**| **88** | **80** | **70**| **62** |
| **MistralLite-AWQ-64g-4b-GEMM** | **96**| **96**| **90** | **70** | **72**| **60** |
| **MistralLite-AWQ-32g-4b-GEMM** | **98**| **96**| **84** | **76** | **70**| **62** |
| Mistral-7B-Instruct-v0.1 | 96 | 56 | 38 | 36 | 30 | 30 |
| Mistral-7B-Instruct-v0.2 | 100 | 100 | 96 | 98 | 96 | 84 |
| Mixtral-8x7B-v0.1 | 54 | 38 | 56 | 66 | 62 | 38 |
| Mixtral-8x7B-Instruct-v0.1 | 100 | 100 | 100 | 100 | 100 | 100 |
#### Pass Key Retrieval
See https://github.com/epfml/landmark-attention/blob/main/llama/run_test.py#L101
| Model Name | n_garbage=12000 | n_garbage=20000 | n_garbage=31000 | n_garbage=38000 | n_garbage=45000 | n_garbage=60000 |
|:----------|-------------:|-------------:|------------:|-----------:|-----------:|-----------:|
| _n_tokens_ (approx.) = | _3272_ | _5405_ | _8338_ | _10205_ | _12071_ | _16072_ |
| MistralLite | 100 | 100 | 100 | 100 | 100 | 100|
| **MistralLite-AWQ** | **100** | **100**| **100**| **100** | **100**| **100**|
| **MistralLite-AWQ-64g-4b-GEMM** | **100** | **100**| **100**| **100** | **100**| **100**|
| **MistralLite-AWQ-32g-4b-GEMM** | **100** | **100**| **100**| **100** | **100**| **100**|
| Mistral-7B-Instruct-v0.1 | 100 | 50 | 30 | 20 | 10 | 10 |
| Mistral-7B-Instruct-v0.2 | 100 | 100 | 100 | 100 | 100 | 100 |
| Mixtral-8x7B-v0.1 | 100 | 100 | 100 | 100 | 100 | 100 |
| Mixtral-8x7B-Instruct-v0.1 | 100 | 100 | 100 | 90 | 100 | 100 |
#### QuALITY (Question Answering with Long Input Texts, Yes!)
See https://nyu-mll.github.io/quality/
|Model Name| Test set Accuracy | Hard subset Accuracy|
|:----------|-------------:|-------------:|
| MistralLite | 56.8 | 74.5 |
| **MistralLite-AWQ** | **55.3** | **71.8** |
| **MistralLite-AWQ-64g-4b-GEMM** | **55.2** | **72.9** |
| **MistralLite-AWQ-32g-4b-GEMM** | **56.6** | **72.8** |
| Mistral-7B-Instruct-v0.1 | 45.2 | 58.9 |
| Mistral-7B-Instruct-v0.2 | 55.5 | 74 |
| Mixtral-8x7B-v0.1 | 75 | 74.1 |
| Mixtral-8x7B-Instruct-v0.1 | 68.7 | 83.3 |
## Usage
## Inference via vLLM HTTP Host
### Launch Host
```bash
python -m vllm.entrypoints.openai.api_server \
--model amazon/MistralLite-AWQ \
--quantization awq
```
### Query Host
```bash
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{ "model": "amazon/MistralLite-AWQ",
"prompt": "<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>",
"temperature": 0,
"echo": false
}'
```
## Inference via [vLLM Offline Inference](https://docs.vllm.ai/en/latest/getting_started/examples/offline_inference.html)
```python
from vllm import LLM, SamplingParams
prompts = [
"<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>",
]
sampling_params = SamplingParams(temperature=0, max_tokens=100)
llm = LLM(model="amazon/MistralLite-AWQ")
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
## License
Apache 2.0
## Limitations
Before using the MistralLite-AWQ model, it is important to perform your own
independent assessment, and take measures to ensure that your use would comply
with your own specific quality control practices and standards, and that your
use would comply with the local rules, laws, regulations, licenses and terms
that apply to you, and your content.

34
config.json Normal file
View File

@@ -0,0 +1,34 @@
{
"_name_or_path": "amazon/MistralLite-AWQ",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"quantization_config": {
"bits": 4,
"group_size": 128,
"modules_to_not_convert": null,
"quant_method": "awq",
"version": "gemm",
"zero_point": true
},
"rms_norm_eps": 1e-05,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.40.2",
"use_cache": true,
"vocab_size": 32003
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"transformers_version": "4.40.2"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:95960bb0b19cd967e4ef0a1397ec885e6123bbd63b5934a992f253612e42dc1d
size 4150929384

37
special_tokens_map.json Normal file
View File

@@ -0,0 +1,37 @@
{
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>",
"<|assistant|>",
"<|prompter|>"
],
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "[PAD]",
"lstrip": true,
"normalized": false,
"rstrip": true,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:68b5a0553901b2aadb40aebd0d4bd1e845a2b2040f24a4351d22090c0e33779c
size 1795883

72
tokenizer_config.json Normal file
View File

@@ -0,0 +1,72 @@
{
"add_bos_token": true,
"add_eos_token": false,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "[PAD]",
"lstrip": true,
"normalized": false,
"rstrip": true,
"single_word": false,
"special": true
},
"32001": {
"content": "<|assistant|>",
"lstrip": true,
"normalized": false,
"rstrip": true,
"single_word": false,
"special": true
},
"32002": {
"content": "<|prompter|>",
"lstrip": true,
"normalized": false,
"rstrip": true,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<unk>",
"<s>",
"</s>",
"<|assistant|>",
"<|prompter|>"
],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}