ModelHub XC ff9b045879 初始化项目,由ModelHub XC社区提供模型
Model: umyunsang/GovOn-EXAONE-AWQ-v2
Source: Original Platform
2026-04-10 20:08:11 +08:00

language, license, base_model, tags, pipeline_tag
language license base_model tags pipeline_tag
ko
en
apache-2.0 umyunsang/GovOn-EXAONE-Merged-v2
exaone
civil-complaint
govon
korean
awq
4bit
quantization
on-device
text-generation

GovOn-EXAONE-AWQ-v2

Introduction

GovOn-EXAONE-AWQ-v2 is an optimized 4-bit quantized version of GovOn-EXAONE-Merged-v2, designed for On-Device and low-latency deployment in civil service environments.

By applying AWQ (Activation-aware Weight Quantization) (W4A16g128), we reduced the model size by 66.1% (from 14.56GB to 4.94GB) while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM.

Quickstart

We recommend using vLLM or AutoAWQ for optimized inference.

Using AutoAWQ

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_id = "umyunsang/GovOn-EXAONE-AWQ-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True)

# (Inference code same as Merged-v2)

Specifications

Model Details

  • Source Model: umyunsang/GovOn-EXAONE-Merged-v2
  • Quantization Method: AWQ (Weight-only 4-bit)
  • Config: W4A16, Group Size 128, Zero Point True
  • Model Size: 4.94 GB (BF16 Original: 14.56 GB)
  • VRAM Required: ~6.5 GB (Inference)

Efficiency

  • Compression Ratio: 2.95x
  • Size Reduction: 66.1%
  • Calibration: 512 domain-specific civil complaint samples

Limitation and Usage

  1. Quantization Loss: While AWQ minimizes performance drops, slight deviations in CoT (<thought>) or nuanced reasoning might occur compared to the BF16 version.
  2. Infrastructure: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended).

License

This model is licensed under the Apache License 2.0. However, users must also comply with the EXAONE AI Model License Agreement of the base model.

Description
Model synced from source: umyunsang/GovOn-EXAONE-AWQ-v2
Readme 3.3 MiB
Languages
Python 100%