Files
ModelHub XC dc85b46d2c 初始化项目,由ModelHub XC社区提供模型
Model: QuantFactory/llama-3.1-Asian-Bllossom-8B-Translator-GGUF
Source: Original Platform
2026-04-10 11:19:02 +08:00

152 lines
5.0 KiB
Markdown

---
library_name: transformers
license: llama3.1
language:
- ko
- vi
- id
- km
- th
metrics:
- bleu
- rouge
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
# QuantFactory/llama-3.1-Asian-Bllossom-8B-Translator-GGUF
This is quantized version of [MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator](https://huggingface.co/MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator) created using llama.cpp
# Original Model Card
# Model Card for Model ID
This model is a multilingual translation model fine-tuned on LLaMA 3.1 Instruct base model. It enables mutual translation between the following Southeast Asian languages:
- Korean
- Vietnamese
- Indonesian
- Cambodian (Khmer)
- Thai
## Acknowledgements
AICA <img src="https://aica-gj.kr/images/logo.png" width="20%" height="20%">
## Model Details
The model is designed for translating short text segments between any pair of the supported languages.
Supported language pairs:
- Korean ↔ Vietnamese
- Korean ↔ Indonesian
- Korean ↔ Cambodian
- Korean ↔ Thai
- Vietnamese ↔ Indonesian
- Vietnamese ↔ Cambodian
- Vietnamese ↔ Thai
- Indonesian ↔ Cambodian
- Indonesian ↔ Thai
- Cambodian ↔ Thai
### Model Description
This model is specifically optimized for Southeast Asian language translation needs, focusing on enabling communication between these specific language communities.
The extensive training data of 20M examples (1M for each translation direction) provides a robust foundation for handling common expressions and basic conversations across these languages.
### Model Architecture
Base Model: meta-llama/Llama-3.1-8B-Instruct
## Bias, Risks, and Limitations
- Performance is limited to short sentences and phrases
- May not handle complex or lengthy text effectively
- Translation quality may vary depending on language pair and content complexity
## Evaluation results
| Source Language | Target Language | BLEU Score | ROUGE-1 | ROUGE-L |
|----------------|-----------------|------------|---------|---------|
| Korean | Vietnamese | 56.70 | 81.64 | 76.66 |
| Korean | Cambodian | 71.69 | 89.26 | 88.20 |
| Korean | Indonesian | 58.32 | 80.39 | 76.63 |
| Korean | Thai | 63.26 | 78.88 | 72.29 |
| Vietnamese | Korean | 49.01 | 75.57 | 72.74 |
| Vietnamese | Cambodian | 78.26 | 90.74 | 90.32 |
| Vietnamese | Indonesian | 65.96 | 83.08 | 81.46 |
| Vietnamese | Thai | 65.93 | 81.09 | 76.57 |
| Cambodian | Korean | 49.10 | 72.67 | 69.75 |
| Cambodian | Vietnamese | 63.42 | 81.56 | 79.09 |
| Cambodian | Indonesian | 61.41 | 79.67 | 77.75 |
| Cambodian | Thai | 70.91 | 81.85 | 77.66 |
| Indonesian | Korean | 53.61 | 77.14 | 74.29 |
| Indonesian | Vietnamese | 68.21 | 85.41 | 83.10 |
| Indonesian | Cambodian | 78.84 | 90.81 | 90.35 |
| Indonesian | Thai | 67.12 | 81.54 | 77.19 |
| Thai | Korean | 45.59 | 72.48 | 69.46 |
| Thai | Vietnamese | 61.55 | 81.01 | 78.24 |
| Thai | Cambodian | 78.52 | 91.47 | 91.16 |
| Thai | Indonesian | 58.99 | 78.56 | 76.40 |
## Example
```py
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
"MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator",
)
input_text = "안녕하세요? 아시아 언어 번역 모델 입니다."
def get_input_ids(source_lang, target_lang, message):
assert source_lang in ["Korean", "Vietnamese", "Indonesian", "Thai", "Cambodian"]
assert target_lang in ["Korean", "Vietnamese", "Indonesian", "Thai", "Cambodian"]
input_ids = tokenizer.apply_chat_template(
conversation=[
{"role": "system", "content": f"You are a useful translation AI. Please translate the sentence given in {source_lang} into {target_lang}."},
{"role": "user", "content": message},
],
tokenize=True,
return_tensors="pt",
add_generation_prompt=True,
)
return input_ids
input_ids = get_input_ids(
source_lang="Korean",
target_lang="Vietnamese",
message=input_text,
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=128,
)
print(tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True))
```
## Contributor
- 원인호 (wih1226@seoultech.ac.kr)
- 김민준 (mjkmain@seoultech.ac.kr)