初始化项目,由ModelHub XC社区提供模型
Model: thangvip/qwen3-4b-vietnamese-legal-grpo Source: Original Platform
This commit is contained in:
225
README.md
Normal file
225
README.md
Normal file
@@ -0,0 +1,225 @@
|
||||
---
|
||||
base_model: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
|
||||
library_name: transformers
|
||||
model_name: thangvip/qwen3-4b-vietnamese-legal-grpo
|
||||
tags:
|
||||
- grpo
|
||||
- vietnamese
|
||||
- legal
|
||||
- reasoning
|
||||
- syllogism
|
||||
- trl
|
||||
- generated_from_trainer
|
||||
license: apache-2.0
|
||||
language:
|
||||
- vi
|
||||
datasets:
|
||||
- legal-qa-vietnamese
|
||||
pipeline_tag: text-generation
|
||||
widget:
|
||||
- example_title: "Legal Question Example"
|
||||
text: "Câu hỏi: Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
|
||||
---
|
||||
|
||||
# Vietnamese Legal Reasoning Model - GRPO Fine-tuned
|
||||
|
||||
## 🏛️ Model Description
|
||||
|
||||
This model is a **Vietnamese legal reasoning specialist** fine-tuned using **Group Relative Policy Optimization (GRPO)** on Vietnamese legal question-answering data. It's specifically designed to perform **syllogistic reasoning** for Vietnamese legal scenarios.
|
||||
|
||||
### 🎯 Base Model
|
||||
- **Base**: [thangvip/qwen3-4b-legal-pretrain-synthetic-8k](https://huggingface.co/thangvip/qwen3-4b-legal-pretrain-synthetic-8k)
|
||||
- **Architecture**: Qwen 3 (4B parameters)
|
||||
- **Language**: Vietnamese
|
||||
- **Specialization**: Legal reasoning and syllogism
|
||||
|
||||
### 🔥 Key Features
|
||||
|
||||
✅ **Syllogistic Reasoning**: Structured legal arguments (Major Premise → Minor Premise → Conclusion)
|
||||
✅ **Vietnamese Legal Domain**: Trained on Vietnamese legal texts and Q&A
|
||||
✅ **GRPO Optimization**: Advanced policy optimization for better reasoning
|
||||
✅ **Citation Support**: Generates responses with legal citations
|
||||
✅ **Structured Output**: Uses XML-like tags for organized responses
|
||||
|
||||
## 📊 Model Architecture
|
||||
|
||||
- **Parameters**: ~4B
|
||||
- **Vocabulary Size**: 151936
|
||||
- **Hidden Size**: 2560
|
||||
- **Layers**: 36
|
||||
- **Attention Heads**: 32
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install transformers torch
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
|
||||
# Load model and tokenizer
|
||||
model_name = "thangvip/qwen3-4b-vietnamese-legal-grpo"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# Format your legal question
|
||||
system_prompt = """Bạn là một chuyên gia pháp lý. Hãy trả lời câu hỏi bằng cách sử dụng phương pháp lập luận tam đoạn luận (syllogism).
|
||||
|
||||
Trước tiên, hãy suy nghĩ về vấn đề trong thẻ <think></think>.
|
||||
|
||||
Sau đó, trả lời theo định dạng sau:
|
||||
<answer>
|
||||
<major_premise>[Quy định pháp luật chung]</major_premise>
|
||||
<minor_premise>[Sự kiện cụ thể trong câu hỏi]</minor_premise>
|
||||
<conclusion>[Áp dụng quy định vào sự kiện để đưa ra kết luận]</conclusion>
|
||||
</answer>
|
||||
|
||||
Hãy đảm bảo trích dẫn chính xác các điều luật liên quan."""
|
||||
|
||||
question = "Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
|
||||
|
||||
# Create conversation
|
||||
messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": question}
|
||||
]
|
||||
|
||||
# Generate response
|
||||
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=1024,
|
||||
temperature=0.7,
|
||||
do_sample=True,
|
||||
pad_token_id=tokenizer.eos_token_id
|
||||
)
|
||||
|
||||
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Pipeline Usage
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
# Create text generation pipeline
|
||||
generator = pipeline(
|
||||
"text-generation",
|
||||
model="thangvip/qwen3-4b-vietnamese-legal-grpo",
|
||||
tokenizer="thangvip/qwen3-4b-vietnamese-legal-grpo",
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# Generate legal reasoning
|
||||
prompt = "Câu hỏi: Quyền và nghĩa vụ của người thuê nhà khi hợp đồng thuê hết hạn?"
|
||||
result = generator(prompt, max_new_tokens=512, temperature=0.7)
|
||||
print(result[0]['generated_text'])
|
||||
```
|
||||
|
||||
## 🎯 Training Details
|
||||
|
||||
### Training Procedure
|
||||
- **Method**: Group Relative Policy Optimization (GRPO)
|
||||
- **Base Model**: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
|
||||
- **Training Steps**: N/A
|
||||
- **Learning Rate**: N/A
|
||||
- **Batch Size**: N/A
|
||||
|
||||
### Training Data
|
||||
- **Domain**: Vietnamese legal question-answering
|
||||
- **Format**: Syllogistic reasoning pairs
|
||||
- **Structure**: Question → Structured legal reasoning response
|
||||
|
||||
### Reward System
|
||||
The model was trained with a sophisticated reward system:
|
||||
- **Correctness** (35%): Factual accuracy against reference answers
|
||||
- **Format Compliance** (20%): Proper use of syllogistic structure
|
||||
- **Citation Accuracy** (15%): Relevant and accurate legal citations
|
||||
- **Reasoning Quality** (15%): Quality of legal reasoning process
|
||||
- **Hallucination Penalty** (10%): Penalty for unsupported claims
|
||||
- **Length Penalty** (5%): Penalty for exceeding maximum token length
|
||||
|
||||
## 📝 Expected Output Format
|
||||
|
||||
The model generates structured responses in this format:
|
||||
|
||||
```xml
|
||||
<think>
|
||||
[Internal reasoning about the legal question]
|
||||
</think>
|
||||
|
||||
<answer>
|
||||
<major_premise>
|
||||
[General legal rule or principle applicable to the situation]
|
||||
</major_premise>
|
||||
|
||||
<minor_premise>
|
||||
[Specific facts from the question that relate to the legal rule]
|
||||
</minor_premise>
|
||||
|
||||
<conclusion>
|
||||
[Legal conclusion that follows logically from applying the rule to the facts]
|
||||
</conclusion>
|
||||
</answer>
|
||||
```
|
||||
|
||||
## 🎯 Use Cases
|
||||
|
||||
- **Legal Education**: Teaching legal reasoning methodology
|
||||
- **Legal Research**: Preliminary analysis of legal questions
|
||||
- **Document Drafting**: Structured legal argument generation
|
||||
- **Legal Consultation**: Initial legal guidance (with human review)
|
||||
|
||||
## ⚠️ Limitations
|
||||
|
||||
- **Domain Specific**: Optimized for Vietnamese legal context
|
||||
- **Educational Purpose**: Should not replace professional legal advice
|
||||
- **Fact Checking Required**: Always verify legal citations and conclusions
|
||||
- **Context Window**: Limited by base model's context length
|
||||
|
||||
## 📄 Citation
|
||||
|
||||
If you use this model, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{vietnamese-legal-grpo-2024,
|
||||
title={Vietnamese Legal Reasoning Model with GRPO},
|
||||
author={Your Name},
|
||||
year={2024},
|
||||
publisher={Hugging Face},
|
||||
url={https://huggingface.co/thangvip/qwen3-4b-vietnamese-legal-grpo}
|
||||
}
|
||||
```
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions are welcome! Please see our [contributing guidelines](CONTRIBUTING.md).
|
||||
|
||||
## 📜 License
|
||||
|
||||
This model is released under the Apache 2.0 License.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
- **TRL Team**: For the GRPO implementation
|
||||
- **Qwen Team**: For the excellent base model
|
||||
- **Hugging Face**: For the transformers library and model hosting
|
||||
|
||||
---
|
||||
|
||||
**Note**: This model is for educational and research purposes. Always consult qualified legal professionals for actual legal advice.
|
||||
Reference in New Issue
Block a user