初始化项目，由ModelHub XC社区提供模型

Model: umyunsang/GovOn-EXAONE-AWQ-v2 Source: Original Platform
2026-04-10 20:08:11 +08:00
commit ff9b045879
15 changed files with 620372 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,60 @@
+---
+language:
+- ko
+- en
+license: apache-2.0
+base_model: umyunsang/GovOn-EXAONE-Merged-v2
+tags:
+- exaone
+- civil-complaint
+- govon
+- korean
+- awq
+- 4bit
+- quantization
+- on-device
+pipeline_tag: text-generation
+---
+
+# GovOn-EXAONE-AWQ-v2
+
+## Introduction
+**GovOn-EXAONE-AWQ-v2** is an optimized 4-bit quantized version of [GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2), designed for **On-Device** and **low-latency** deployment in civil service environments.
+
+By applying **AWQ (Activation-aware Weight Quantization)** (W4A16g128), we reduced the model size by **66.1% (from 14.56GB to 4.94GB)** while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM.
+
+## Quickstart
+We recommend using `vLLM` or `AutoAWQ` for optimized inference.
+
+### Using AutoAWQ
+```python
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+
+model_id = "umyunsang/GovOn-EXAONE-AWQ-v2"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True)
+
+# (Inference code same as Merged-v2)
+```
+
+## Specifications
+### Model Details
+- **Source Model**: [umyunsang/GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2)
+- **Quantization Method**: AWQ (Weight-only 4-bit)
+- **Config**: W4A16, Group Size 128, Zero Point True
+- **Model Size**: 4.94 GB (BF16 Original: 14.56 GB)
+- **VRAM Required**: ~6.5 GB (Inference)
+
+### Efficiency
+- **Compression Ratio**: 2.95x
+- **Size Reduction**: 66.1%
+- **Calibration**: 512 domain-specific civil complaint samples
+
+## Limitation and Usage
+1. **Quantization Loss**: While AWQ minimizes performance drops, slight deviations in CoT (`<thought>`) or nuanced reasoning might occur compared to the BF16 version.
+2. **Infrastructure**: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended).
+
+## License
+This model is licensed under the **Apache License 2.0**. However, users must also comply with the [EXAONE AI Model License Agreement](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct/blob/main/LICENSE) of the base model.