226 lines
6.7 KiB
Markdown
226 lines
6.7 KiB
Markdown
---
|
|
base_model: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
|
|
library_name: transformers
|
|
model_name: thangvip/qwen3-4b-vietnamese-legal-grpo
|
|
tags:
|
|
- grpo
|
|
- vietnamese
|
|
- legal
|
|
- reasoning
|
|
- syllogism
|
|
- trl
|
|
- generated_from_trainer
|
|
license: apache-2.0
|
|
language:
|
|
- vi
|
|
datasets:
|
|
- legal-qa-vietnamese
|
|
pipeline_tag: text-generation
|
|
widget:
|
|
- example_title: "Legal Question Example"
|
|
text: "Câu hỏi: Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
|
|
---
|
|
|
|
# Vietnamese Legal Reasoning Model - GRPO Fine-tuned
|
|
|
|
## 🏛️ Model Description
|
|
|
|
This model is a **Vietnamese legal reasoning specialist** fine-tuned using **Group Relative Policy Optimization (GRPO)** on Vietnamese legal question-answering data. It's specifically designed to perform **syllogistic reasoning** for Vietnamese legal scenarios.
|
|
|
|
### 🎯 Base Model
|
|
- **Base**: [thangvip/qwen3-4b-legal-pretrain-synthetic-8k](https://huggingface.co/thangvip/qwen3-4b-legal-pretrain-synthetic-8k)
|
|
- **Architecture**: Qwen 3 (4B parameters)
|
|
- **Language**: Vietnamese
|
|
- **Specialization**: Legal reasoning and syllogism
|
|
|
|
### 🔥 Key Features
|
|
|
|
✅ **Syllogistic Reasoning**: Structured legal arguments (Major Premise → Minor Premise → Conclusion)
|
|
✅ **Vietnamese Legal Domain**: Trained on Vietnamese legal texts and Q&A
|
|
✅ **GRPO Optimization**: Advanced policy optimization for better reasoning
|
|
✅ **Citation Support**: Generates responses with legal citations
|
|
✅ **Structured Output**: Uses XML-like tags for organized responses
|
|
|
|
## 📊 Model Architecture
|
|
|
|
- **Parameters**: ~4B
|
|
- **Vocabulary Size**: 151936
|
|
- **Hidden Size**: 2560
|
|
- **Layers**: 36
|
|
- **Attention Heads**: 32
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
pip install transformers torch
|
|
```
|
|
|
|
### Basic Usage
|
|
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
import torch
|
|
|
|
# Load model and tokenizer
|
|
model_name = "thangvip/qwen3-4b-vietnamese-legal-grpo"
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_name,
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="auto"
|
|
)
|
|
|
|
# Format your legal question
|
|
system_prompt = """Bạn là một chuyên gia pháp lý. Hãy trả lời câu hỏi bằng cách sử dụng phương pháp lập luận tam đoạn luận (syllogism).
|
|
|
|
Trước tiên, hãy suy nghĩ về vấn đề trong thẻ <think></think>.
|
|
|
|
Sau đó, trả lời theo định dạng sau:
|
|
<answer>
|
|
<major_premise>[Quy định pháp luật chung]</major_premise>
|
|
<minor_premise>[Sự kiện cụ thể trong câu hỏi]</minor_premise>
|
|
<conclusion>[Áp dụng quy định vào sự kiện để đưa ra kết luận]</conclusion>
|
|
</answer>
|
|
|
|
Hãy đảm bảo trích dẫn chính xác các điều luật liên quan."""
|
|
|
|
question = "Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
|
|
|
|
# Create conversation
|
|
messages = [
|
|
{"role": "system", "content": system_prompt},
|
|
{"role": "user", "content": question}
|
|
]
|
|
|
|
# Generate response
|
|
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
|
|
|
with torch.no_grad():
|
|
outputs = model.generate(
|
|
**inputs,
|
|
max_new_tokens=1024,
|
|
temperature=0.7,
|
|
do_sample=True,
|
|
pad_token_id=tokenizer.eos_token_id
|
|
)
|
|
|
|
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
|
print(response)
|
|
```
|
|
|
|
### Pipeline Usage
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
# Create text generation pipeline
|
|
generator = pipeline(
|
|
"text-generation",
|
|
model="thangvip/qwen3-4b-vietnamese-legal-grpo",
|
|
tokenizer="thangvip/qwen3-4b-vietnamese-legal-grpo",
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="auto"
|
|
)
|
|
|
|
# Generate legal reasoning
|
|
prompt = "Câu hỏi: Quyền và nghĩa vụ của người thuê nhà khi hợp đồng thuê hết hạn?"
|
|
result = generator(prompt, max_new_tokens=512, temperature=0.7)
|
|
print(result[0]['generated_text'])
|
|
```
|
|
|
|
## 🎯 Training Details
|
|
|
|
### Training Procedure
|
|
- **Method**: Group Relative Policy Optimization (GRPO)
|
|
- **Base Model**: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
|
|
- **Training Steps**: N/A
|
|
- **Learning Rate**: N/A
|
|
- **Batch Size**: N/A
|
|
|
|
### Training Data
|
|
- **Domain**: Vietnamese legal question-answering
|
|
- **Format**: Syllogistic reasoning pairs
|
|
- **Structure**: Question → Structured legal reasoning response
|
|
|
|
### Reward System
|
|
The model was trained with a sophisticated reward system:
|
|
- **Correctness** (35%): Factual accuracy against reference answers
|
|
- **Format Compliance** (20%): Proper use of syllogistic structure
|
|
- **Citation Accuracy** (15%): Relevant and accurate legal citations
|
|
- **Reasoning Quality** (15%): Quality of legal reasoning process
|
|
- **Hallucination Penalty** (10%): Penalty for unsupported claims
|
|
- **Length Penalty** (5%): Penalty for exceeding maximum token length
|
|
|
|
## 📝 Expected Output Format
|
|
|
|
The model generates structured responses in this format:
|
|
|
|
```xml
|
|
<think>
|
|
[Internal reasoning about the legal question]
|
|
</think>
|
|
|
|
<answer>
|
|
<major_premise>
|
|
[General legal rule or principle applicable to the situation]
|
|
</major_premise>
|
|
|
|
<minor_premise>
|
|
[Specific facts from the question that relate to the legal rule]
|
|
</minor_premise>
|
|
|
|
<conclusion>
|
|
[Legal conclusion that follows logically from applying the rule to the facts]
|
|
</conclusion>
|
|
</answer>
|
|
```
|
|
|
|
## 🎯 Use Cases
|
|
|
|
- **Legal Education**: Teaching legal reasoning methodology
|
|
- **Legal Research**: Preliminary analysis of legal questions
|
|
- **Document Drafting**: Structured legal argument generation
|
|
- **Legal Consultation**: Initial legal guidance (with human review)
|
|
|
|
## ⚠️ Limitations
|
|
|
|
- **Domain Specific**: Optimized for Vietnamese legal context
|
|
- **Educational Purpose**: Should not replace professional legal advice
|
|
- **Fact Checking Required**: Always verify legal citations and conclusions
|
|
- **Context Window**: Limited by base model's context length
|
|
|
|
## 📄 Citation
|
|
|
|
If you use this model, please cite:
|
|
|
|
```bibtex
|
|
@misc{vietnamese-legal-grpo-2024,
|
|
title={Vietnamese Legal Reasoning Model with GRPO},
|
|
author={Your Name},
|
|
year={2024},
|
|
publisher={Hugging Face},
|
|
url={https://huggingface.co/thangvip/qwen3-4b-vietnamese-legal-grpo}
|
|
}
|
|
```
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions are welcome! Please see our [contributing guidelines](CONTRIBUTING.md).
|
|
|
|
## 📜 License
|
|
|
|
This model is released under the Apache 2.0 License.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
- **TRL Team**: For the GRPO implementation
|
|
- **Qwen Team**: For the excellent base model
|
|
- **Hugging Face**: For the transformers library and model hosting
|
|
|
|
---
|
|
|
|
**Note**: This model is for educational and research purposes. Always consult qualified legal professionals for actual legal advice.
|