初始化项目,由ModelHub XC社区提供模型
Model: umyunsang/GovOn-EXAONE-AWQ-v2 Source: Original Platform
This commit is contained in:
60
README.md
Normal file
60
README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
language:
|
||||
- ko
|
||||
- en
|
||||
license: apache-2.0
|
||||
base_model: umyunsang/GovOn-EXAONE-Merged-v2
|
||||
tags:
|
||||
- exaone
|
||||
- civil-complaint
|
||||
- govon
|
||||
- korean
|
||||
- awq
|
||||
- 4bit
|
||||
- quantization
|
||||
- on-device
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# GovOn-EXAONE-AWQ-v2
|
||||
|
||||
## Introduction
|
||||
**GovOn-EXAONE-AWQ-v2** is an optimized 4-bit quantized version of [GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2), designed for **On-Device** and **low-latency** deployment in civil service environments.
|
||||
|
||||
By applying **AWQ (Activation-aware Weight Quantization)** (W4A16g128), we reduced the model size by **66.1% (from 14.56GB to 4.94GB)** while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM.
|
||||
|
||||
## Quickstart
|
||||
We recommend using `vLLM` or `AutoAWQ` for optimized inference.
|
||||
|
||||
### Using AutoAWQ
|
||||
```python
|
||||
from awq import AutoAWQForCausalLM
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
model_id = "umyunsang/GovOn-EXAONE-AWQ-v2"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
||||
model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True)
|
||||
|
||||
# (Inference code same as Merged-v2)
|
||||
```
|
||||
|
||||
## Specifications
|
||||
### Model Details
|
||||
- **Source Model**: [umyunsang/GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2)
|
||||
- **Quantization Method**: AWQ (Weight-only 4-bit)
|
||||
- **Config**: W4A16, Group Size 128, Zero Point True
|
||||
- **Model Size**: 4.94 GB (BF16 Original: 14.56 GB)
|
||||
- **VRAM Required**: ~6.5 GB (Inference)
|
||||
|
||||
### Efficiency
|
||||
- **Compression Ratio**: 2.95x
|
||||
- **Size Reduction**: 66.1%
|
||||
- **Calibration**: 512 domain-specific civil complaint samples
|
||||
|
||||
## Limitation and Usage
|
||||
1. **Quantization Loss**: While AWQ minimizes performance drops, slight deviations in CoT (`<thought>`) or nuanced reasoning might occur compared to the BF16 version.
|
||||
2. **Infrastructure**: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended).
|
||||
|
||||
## License
|
||||
This model is licensed under the **Apache License 2.0**. However, users must also comply with the [EXAONE AI Model License Agreement](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct/blob/main/LICENSE) of the base model.
|
||||
Reference in New Issue
Block a user