Go to file

ModelHub XC ff9b045879 初始化项目，由ModelHub XC社区提供模型

Model: umyunsang/GovOn-EXAONE-AWQ-v2
Source: Original Platform

2026-04-10 20:08:11 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

configuration_exaone.py

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

modeling_exaone.py

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

quantization_log.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-10 20:08:11 +08:00

README.md

language, license, base_model, tags, pipeline_tag

language

license

base_model

GovOn-EXAONE-AWQ-v2

Introduction

GovOn-EXAONE-AWQ-v2 is an optimized 4-bit quantized version of GovOn-EXAONE-Merged-v2, designed for On-Device and low-latency deployment in civil service environments.

By applying AWQ (Activation-aware Weight Quantization) (W4A16g128), we reduced the model size by 66.1% (from 14.56GB to 4.94GB) while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM.

Quickstart

We recommend using vLLM or AutoAWQ for optimized inference.

Using AutoAWQ

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_id = "umyunsang/GovOn-EXAONE-AWQ-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True)

# (Inference code same as Merged-v2)

Specifications

Model Details

Source Model: umyunsang/GovOn-EXAONE-Merged-v2
Quantization Method: AWQ (Weight-only 4-bit)
Config: W4A16, Group Size 128, Zero Point True
Model Size: 4.94 GB (BF16 Original: 14.56 GB)
VRAM Required: ~6.5 GB (Inference)

Efficiency

Compression Ratio: 2.95x
Size Reduction: 66.1%
Calibration: 512 domain-specific civil complaint samples

Limitation and Usage

Quantization Loss: While AWQ minimizes performance drops, slight deviations in CoT (<thought>) or nuanced reasoning might occur compared to the BF16 version.
Infrastructure: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended).

License

This model is licensed under the Apache License 2.0. However, users must also comply with the EXAONE AI Model License Agreement of the base model.