--- language: - ko - en license: apache-2.0 base_model: umyunsang/GovOn-EXAONE-Merged-v2 tags: - exaone - civil-complaint - govon - korean - awq - 4bit - quantization - on-device pipeline_tag: text-generation --- # GovOn-EXAONE-AWQ-v2 ## Introduction **GovOn-EXAONE-AWQ-v2** is an optimized 4-bit quantized version of [GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2), designed for **On-Device** and **low-latency** deployment in civil service environments. By applying **AWQ (Activation-aware Weight Quantization)** (W4A16g128), we reduced the model size by **66.1% (from 14.56GB to 4.94GB)** while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM. ## Quickstart We recommend using `vLLM` or `AutoAWQ` for optimized inference. ### Using AutoAWQ ```python from awq import AutoAWQForCausalLM from transformers import AutoTokenizer model_id = "umyunsang/GovOn-EXAONE-AWQ-v2" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True) # (Inference code same as Merged-v2) ``` ## Specifications ### Model Details - **Source Model**: [umyunsang/GovOn-EXAONE-Merged-v2](https://huggingface.co/umyunsang/GovOn-EXAONE-Merged-v2) - **Quantization Method**: AWQ (Weight-only 4-bit) - **Config**: W4A16, Group Size 128, Zero Point True - **Model Size**: 4.94 GB (BF16 Original: 14.56 GB) - **VRAM Required**: ~6.5 GB (Inference) ### Efficiency - **Compression Ratio**: 2.95x - **Size Reduction**: 66.1% - **Calibration**: 512 domain-specific civil complaint samples ## Limitation and Usage 1. **Quantization Loss**: While AWQ minimizes performance drops, slight deviations in CoT (``) or nuanced reasoning might occur compared to the BF16 version. 2. **Infrastructure**: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended). ## License This model is licensed under the **Apache License 2.0**. However, users must also comply with the [EXAONE AI Model License Agreement](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct/blob/main/LICENSE) of the base model.