Files
llama-3.2-1b-english-persia…/README.md
ModelHub XC 89d721c7fa 初始化项目,由ModelHub XC社区提供模型
Model: Sheikhaei/llama-3.2-1b-english-persian-translator
Source: Original Platform
2026-05-24 19:01:20 +08:00

91 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
- fa
tags:
- translation
- english-to-persian
- persian-to-english
- bilingual
- farsi
- persian
model_type: llama
base_model: meta-llama/Llama-3.2-1B
pipeline_tag: text-generation
library: transformers
widget:
- text: |
### English:
The children were playing in the park.
### Persian:
- text: |
### Persian:
من به مدرسه می‌روم.
### English:
---
[![Space](https://img.shields.io/badge/🤗%20Spaces-Try%20it%20here-blue)](https://huggingface.co/spaces/Sheikhaei/llama-3.2-1b-english-persian-translator)
# LLaMA 3.2 1B English ↔ Persian Translator
This model is a fine-tuned version of [`meta-llama/Llama-3.2-1B`](https://huggingface.co/meta-llama/Llama-3.2-1B), trained for **bidirectional translation** between **English and Persian**. It supports both:
- 🇬🇧 English → 🇮🇷 Persian
- 🇮🇷 Persian → 🇬🇧 English
---
## Format
The model expects prompts in the following format:
```
### English:
The children were playing in the park.
### Persian:
```
or
```
### Persian:
کودکان در پارک بازی می‌کردند.
### English:
```
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Sheikhaei/llama-3.2-1b-en-fa-translator", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Sheikhaei/llama-3.2-1b-en-fa-translator")
prompt = """### English:
The children were playing in the park.
### Persian:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Data
This model was fine-tuned on a custom EnglishPersian parallel dataset containing ~640,000 sentence pairs. The source data was collected from Tatoeba and then translated and expanded using the Gemma-3-12B model.
## Evaluation
| Direction | BLEU | COMET |
|---------------|------|------|
| English → Persian | 0.47 | 0.89 |
| Persian → English | 0.58 | 0.91 |
## License
Apache 2.0