90 lines
1.7 KiB
Markdown
90 lines
1.7 KiB
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- ko
|
||
|
|
- en
|
||
|
|
base_model:
|
||
|
|
- LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- llm
|
||
|
|
- exaone
|
||
|
|
- instruction-tuned
|
||
|
|
- quantized
|
||
|
|
- awq
|
||
|
|
- vllm
|
||
|
|
- medical
|
||
|
|
---
|
||
|
|
|
||
|
|
# Exaone3.5-7.8B_ReST_V0_Quantized
|
||
|
|
|
||
|
|
This model is a fine-tuned and AWQ-quantized version of EXAONE 3.5 7.8B (Instruct), optimized for efficient inference and structured text generation.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
- Base Model: EXAONE 3.5 7.8B (Instruct)
|
||
|
|
- Fine-tuning: Supervised fine-tuning on domain-specific data
|
||
|
|
- Quantization: 4-bit AWQ
|
||
|
|
- Inference: Optimized for vLLM
|
||
|
|
- Context Length: up to 32K tokens
|
||
|
|
|
||
|
|
## Model Details
|
||
|
|
|
||
|
|
- Architecture: ExaoneForCausalLM
|
||
|
|
- Hidden Size: 4096
|
||
|
|
- Layers: 32
|
||
|
|
- Attention Heads: 32
|
||
|
|
- Max Position Embeddings: 32768
|
||
|
|
- Quantization: 4-bit AWQ
|
||
|
|
- Torch dtype: float16
|
||
|
|
|
||
|
|
## Intended Use
|
||
|
|
|
||
|
|
- Instruction-based text generation
|
||
|
|
- Structured output generation (JSON)
|
||
|
|
- LLM-based data pipelines
|
||
|
|
- RAG systems
|
||
|
|
- Efficient inference
|
||
|
|
|
||
|
|
## Example Usage
|
||
|
|
|
||
|
|
```python
|
||
|
|
from vllm import LLM, SamplingParams
|
||
|
|
|
||
|
|
llm = LLM(
|
||
|
|
model="cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized",
|
||
|
|
quantization="AWQ",
|
||
|
|
)
|
||
|
|
|
||
|
|
sampling_params = SamplingParams(
|
||
|
|
temperature=0.2,
|
||
|
|
top_p=0.8,
|
||
|
|
max_tokens=1024,
|
||
|
|
)
|
||
|
|
|
||
|
|
outputs = llm.generate(["Your prompt here"], sampling_params)
|
||
|
|
print(outputs[0].outputs[0].text)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
Fine-tuned using supervised learning on domain-specific data.
|
||
|
|
Dataset is not included due to privacy constraints.
|
||
|
|
|
||
|
|
## Limitations
|
||
|
|
|
||
|
|
- May produce incorrect outputs
|
||
|
|
- Sensitive to prompt quality
|
||
|
|
- Domain bias may exist
|
||
|
|
|
||
|
|
## Safety
|
||
|
|
|
||
|
|
Not intended for critical decision-making without human validation.
|
||
|
|
|
||
|
|
## Evaluation
|
||
|
|
|
||
|
|
- BLEU
|
||
|
|
- ROUGE
|
||
|
|
|
||
|
|
## Deployment
|
||
|
|
|
||
|
|
Optimized for vLLM and GPU-efficient inference.
|