初始化项目,由ModelHub XC社区提供模型
Model: diabolic6045/Sanskrit-qwen-7B-Translate-v2 Source: Original Platform
This commit is contained in:
273
README.md
Normal file
273
README.md
Normal file
@@ -0,0 +1,273 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen2.5-7B-Instruct
|
||||
tags:
|
||||
- sanskrit
|
||||
- translation
|
||||
- transliteration
|
||||
- qwen
|
||||
- axolotl
|
||||
- iast
|
||||
- devanagari
|
||||
- bilingual
|
||||
datasets:
|
||||
- diabolic6045/Sanskrit-transliteration-chat-dataset
|
||||
model-index:
|
||||
- name: Sanskrit-qwen-7B-Translate-v2
|
||||
results: []
|
||||
---
|
||||
|
||||
# Sanskrit-qwen-7B-Translate-v2
|
||||
|
||||
<div align="center">
|
||||
|
||||
<img src="https://huggingface.co/diabolic6045/Sanskrit-qwen-7B-Translate-v2/resolve/main/images/poster.png" alt="Sanskrit AI Poster" width="600" style="margin-bottom: 20px;">
|
||||
|
||||

|
||||

|
||||
|
||||
**A specialized Sanskrit language model for translation and transliteration tasks**
|
||||
|
||||
</div>
|
||||
|
||||
## 🌟 Model Description
|
||||
|
||||
This is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) specifically optimized for Sanskrit language processing. The model has been trained using LoRA (Low-Rank Adaptation) on a comprehensive Sanskrit dataset to excel in three key areas:
|
||||
|
||||
1. **Sanskrit to IAST Transliteration** - Converting Devanagari script to IAST format
|
||||
2. **Sanskrit to English Translation** - Translating Sanskrit text to English
|
||||
3. **English to Sanskrit Translation** - Translating English text to Sanskrit
|
||||
|
||||
## 🚀 Key Features
|
||||
|
||||
### ✨ **Multi-Modal Sanskrit Processing**
|
||||
- **IAST Transliteration**: Accurate conversion from Devanagari to IAST
|
||||
- **Bidirectional Translation**: Sanskrit ↔ English translation
|
||||
- **Context-Aware**: Preserves meaning and cultural context
|
||||
- **Chat-Optimized**: Uses conversation format for natural interactions
|
||||
|
||||
### 🔧 **Technical Improvements Over Previous Model**
|
||||
- **Enhanced Base Model**: Upgraded from Qwen2.5-7B-Instruct-1M to Qwen2.5-7B-Instruct
|
||||
- **Specialized Dataset**: Trained on `Sanskrit-transliteration-chat-dataset` (vs. previous `Sanskrit-llama`)
|
||||
- **Chat Template Format**: Uses structured conversation format for better performance
|
||||
- **Optimized LoRA**: Improved LoRA configuration with better target modules
|
||||
- **Memory Efficient**: Enhanced with flash attention and gradient checkpointing
|
||||
|
||||
## 📊 Model Specifications
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| **Base Model** | Qwen/Qwen2.5-7B-Instruct |
|
||||
| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) |
|
||||
| **LoRA Rank** | 16 |
|
||||
| **LoRA Alpha** | 32 |
|
||||
| **Sequence Length** | 512 tokens |
|
||||
| **Training Epochs** | 3 |
|
||||
| **Learning Rate** | 2e-05 |
|
||||
| **Batch Size** | 2 (micro) × 4 (gradient accumulation) |
|
||||
| **Optimizer** | AdamW 8-bit |
|
||||
| **Precision** | bfloat16 |
|
||||
|
||||
## 🎯 Intended Uses
|
||||
|
||||
### ✅ **Recommended Use Cases**
|
||||
- **Academic Research**: Sanskrit text analysis and translation
|
||||
- **Educational Tools**: Learning Sanskrit through translation
|
||||
- **Cultural Preservation**: Digitizing Sanskrit manuscripts
|
||||
- **Linguistic Studies**: Comparative language analysis
|
||||
- **Content Creation**: Sanskrit-English bilingual content
|
||||
|
||||
### ⚠️ **Limitations**
|
||||
- **Experimental Model**: Still in development, results may vary
|
||||
- **Context Sensitivity**: Performance depends on text complexity
|
||||
- **Domain Specific**: Optimized for classical Sanskrit texts
|
||||
- **Verification Required**: Important translations should be cross-checked
|
||||
|
||||
## 🛠️ Usage Examples
|
||||
|
||||
### 1. Sanskrit to IAST Transliteration
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_name = "diabolic6045/Sanskrit-qwen-7B-Translate-v2"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name)
|
||||
|
||||
# Prepare the conversation
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a Sanskrit transliteration expert. Convert the given Sanskrit text from Devanagari script to IAST (International Alphabet of Sanskrit Transliteration) format."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Transliterate this Sanskrit text to IAST: बुद्धिश्चार्थात्परो लोभः सन्तोषः परमं सुखम् ।"
|
||||
}
|
||||
]
|
||||
|
||||
# Apply chat template and generate
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
|
||||
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
|
||||
|
||||
print(response)
|
||||
# Output: buddhiścārthātparo lobhaḥ santoṣaḥ paramaṃ sukham |
|
||||
```
|
||||
|
||||
### 2. Sanskrit to English Translation
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a Sanskrit to English translation expert. Translate the given Sanskrit text accurately while preserving the meaning and context."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Translate this Sanskrit text to English: यद॒ग्नौ सूर्ये॑ वि॒षं पृ॑थि॒व्यामोष॑धीषु॒ यत् ।"
|
||||
}
|
||||
]
|
||||
|
||||
# Generate translation
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
|
||||
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
|
||||
|
||||
print(response)
|
||||
# Output: The poison that is in the sun, in the earth and in the herbs...
|
||||
```
|
||||
|
||||
### 3. English to Sanskrit Translation
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are an English to Sanskrit translation expert. Translate the given English text accurately into Sanskrit while preserving the meaning and context."
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Translate this English text to Sanskrit: May the divine powers protect us and grant us wisdom."
|
||||
}
|
||||
]
|
||||
|
||||
# Generate Sanskrit translation
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
|
||||
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
|
||||
|
||||
print(response)
|
||||
# Output: देवाः अस्मान् रक्षन्तु बुद्धिं च प्रयच्छन्तु ।
|
||||
```
|
||||
|
||||
## 🎮 Interactive Demo
|
||||
|
||||
Try the model with our Gradio interface:
|
||||
|
||||
|
||||
### Run the interactive [demo](https://huggingface.co/spaces/diabolic6045/Sanskrit-qwen-7B-Translate-v2)
|
||||
|
||||
|
||||
The demo provides:
|
||||
- **Mode Selection**: Choose between transliteration and translation modes
|
||||
- **Real-time Processing**: Instant results with adjustable parameters
|
||||
- **Example Library**: Pre-loaded examples for each mode
|
||||
- **Parameter Tuning**: Adjust temperature and max length
|
||||
|
||||
## 📈 Training Details
|
||||
|
||||
### Dataset Information
|
||||
- **Source**: `diabolic6045/Sanskrit-transliteration-chat-dataset`
|
||||
- **Format**: Chat template with structured conversations
|
||||
- **Size**: Comprehensive Sanskrit corpus with multiple translation pairs
|
||||
- **Validation Split**: 10% for evaluation
|
||||
|
||||
### Training Configuration
|
||||
```yaml
|
||||
# Key training parameters
|
||||
base_model: Qwen/Qwen2.5-7B-Instruct
|
||||
adapter: lora
|
||||
lora_r: 16
|
||||
lora_alpha: 32
|
||||
sequence_len: 512
|
||||
num_epochs: 3
|
||||
learning_rate: 0.00002
|
||||
optimizer: adamw_8bit
|
||||
lr_scheduler: cosine
|
||||
bf16: auto
|
||||
flash_attention: true
|
||||
gradient_checkpointing: true
|
||||
```
|
||||
|
||||
### Hardware Requirements
|
||||
- **Training**: Multi-GPU setup with 24GB+ VRAM per GPU
|
||||
- **Inference**: 8GB+ VRAM for optimal performance
|
||||
- **CPU**: Compatible with CPU inference (slower)
|
||||
|
||||
## 🔄 Comparison with Previous Model
|
||||
|
||||
| Feature | Previous Model | Current Model |
|
||||
|---------|---------------|---------------|
|
||||
| **Base Model** | Qwen2.5-7B-Instruct-1M | Qwen2.5-7B-Instruct |
|
||||
| **Dataset** | Sanskrit-llama (Alpaca) | Sanskrit-transliteration-chat-dataset |
|
||||
| **Format** | Alpaca format | Chat template format |
|
||||
| **Capabilities** | Basic translation | Multi-modal (transliteration + translation) |
|
||||
| **LoRA Rank** | 32 | 16 (optimized) |
|
||||
| **Sequence Length** | 1024 | 512 (focused) |
|
||||
| **Training Epochs** | 1 | 3 (more thorough) |
|
||||
| **Specialization** | General Sanskrit | Specialized for transliteration |
|
||||
|
||||
## 🛡️ Ethical Considerations
|
||||
|
||||
- **Cultural Sensitivity**: Respect for Sanskrit's cultural and religious significance
|
||||
- **Accuracy Disclaimer**: Model outputs should be verified for important translations
|
||||
- **Educational Use**: Primarily intended for educational and research purposes
|
||||
- **Bias Awareness**: May reflect biases present in training data
|
||||
|
||||
## 📚 Citation
|
||||
|
||||
If you use this model in your research, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{sanskrit-qwen-chat-lora,
|
||||
title={Sanskrit-qwen-7B-Translate-v2: A Specialized Sanskrit Translation and Transliteration Model},
|
||||
author={Divax Shah (diabolic6045)},
|
||||
year={2024},
|
||||
url={https://huggingface.co/diabolic6045/Sanskrit-qwen-7B-Translate-v2}
|
||||
}
|
||||
```
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
We welcome contributions to improve this model:
|
||||
|
||||
1. **Dataset Contributions**: High-quality Sanskrit translation pairs
|
||||
2. **Evaluation**: Benchmarking and performance analysis
|
||||
3. **Bug Reports**: Issues and improvement suggestions
|
||||
4. **Documentation**: Usage examples and tutorials
|
||||
|
||||
## 📄 License
|
||||
|
||||
This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- **Qwen Team**: For the excellent base model
|
||||
- **Axolotl Framework**: For the training infrastructure
|
||||
- **Sanskrit Community**: For linguistic guidance and feedback
|
||||
- **Open Source Community**: For tools and resources
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**Built with ❤️ for Sanskrit language preservation and education**
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
||||
|
||||
</div>
|
||||
Reference in New Issue
Block a user