初始化项目，由ModelHub XC社区提供模型

Model: thangvip/qwen3-4b-vietnamese-legal-grpo Source: Original Platform
2026-05-24 05:48:17 +08:00
commit 6feac61ef6
15 changed files with 152530 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,225 @@
+---
+base_model: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
+library_name: transformers
+model_name: thangvip/qwen3-4b-vietnamese-legal-grpo
+tags:
+- grpo
+- vietnamese
+- legal
+- reasoning
+- syllogism
+- trl
+- generated_from_trainer
+license: apache-2.0
+language:
+- vi
+datasets:
+- legal-qa-vietnamese
+pipeline_tag: text-generation
+widget:
+- example_title: "Legal Question Example"
+  text: "Câu hỏi: Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
+---
+
+# Vietnamese Legal Reasoning Model - GRPO Fine-tuned
+
+## 🏛️ Model Description
+
+This model is a **Vietnamese legal reasoning specialist** fine-tuned using **Group Relative Policy Optimization (GRPO)** on Vietnamese legal question-answering data. It's specifically designed to perform **syllogistic reasoning** for Vietnamese legal scenarios.
+
+### 🎯 Base Model
+- **Base**: [thangvip/qwen3-4b-legal-pretrain-synthetic-8k](https://huggingface.co/thangvip/qwen3-4b-legal-pretrain-synthetic-8k)
+- **Architecture**: Qwen 3 (4B parameters)
+- **Language**: Vietnamese
+- **Specialization**: Legal reasoning and syllogism
+
+### 🔥 Key Features
+
+✅ **Syllogistic Reasoning**: Structured legal arguments (Major Premise → Minor Premise → Conclusion)  
+✅ **Vietnamese Legal Domain**: Trained on Vietnamese legal texts and Q&A  
+✅ **GRPO Optimization**: Advanced policy optimization for better reasoning  
+✅ **Citation Support**: Generates responses with legal citations  
+✅ **Structured Output**: Uses XML-like tags for organized responses  
+
+## 📊 Model Architecture
+
+- **Parameters**: ~4B
+- **Vocabulary Size**: 151936
+- **Hidden Size**: 2560
+- **Layers**: 36
+- **Attention Heads**: 32
+
+## 🚀 Quick Start
+
+### Installation
+
+```bash
+pip install transformers torch
+```
+
+### Basic Usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+
+# Load model and tokenizer
+model_name = "thangvip/qwen3-4b-vietnamese-legal-grpo"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+
+# Format your legal question
+system_prompt = """Bạn là một chuyên gia pháp lý. Hãy trả lời câu hỏi bằng cách sử dụng phương pháp lập luận tam đoạn luận (syllogism).
+
+Trước tiên, hãy suy nghĩ về vấn đề trong thẻ <think></think>.
+
+Sau đó, trả lời theo định dạng sau:
+<answer>
+<major_premise>[Quy định pháp luật chung]</major_premise>
+<minor_premise>[Sự kiện cụ thể trong câu hỏi]</minor_premise>
+<conclusion>[Áp dụng quy định vào sự kiện để đưa ra kết luận]</conclusion>
+</answer>
+
+Hãy đảm bảo trích dẫn chính xác các điều luật liên quan."""
+
+question = "Một công ty có nghĩa vụ gì khi sa thải nhân viên do tái cơ cấu?"
+
+# Create conversation
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": question}
+]
+
+# Generate response
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=1024,
+        temperature=0.7,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id
+    )
+
+response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+
+### Pipeline Usage
+
+```python
+from transformers import pipeline
+
+# Create text generation pipeline
+generator = pipeline(
+    "text-generation",
+    model="thangvip/qwen3-4b-vietnamese-legal-grpo",
+    tokenizer="thangvip/qwen3-4b-vietnamese-legal-grpo",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+
+# Generate legal reasoning
+prompt = "Câu hỏi: Quyền và nghĩa vụ của người thuê nhà khi hợp đồng thuê hết hạn?"
+result = generator(prompt, max_new_tokens=512, temperature=0.7)
+print(result[0]['generated_text'])
+```
+
+## 🎯 Training Details
+
+### Training Procedure
+- **Method**: Group Relative Policy Optimization (GRPO)
+- **Base Model**: thangvip/qwen3-4b-legal-pretrain-synthetic-8k
+- **Training Steps**: N/A
+- **Learning Rate**: N/A
+- **Batch Size**: N/A
+
+### Training Data
+- **Domain**: Vietnamese legal question-answering
+- **Format**: Syllogistic reasoning pairs
+- **Structure**: Question → Structured legal reasoning response
+
+### Reward System
+The model was trained with a sophisticated reward system:
+- **Correctness** (35%): Factual accuracy against reference answers
+- **Format Compliance** (20%): Proper use of syllogistic structure
+- **Citation Accuracy** (15%): Relevant and accurate legal citations  
+- **Reasoning Quality** (15%): Quality of legal reasoning process
+- **Hallucination Penalty** (10%): Penalty for unsupported claims
+- **Length Penalty** (5%): Penalty for exceeding maximum token length
+
+## 📝 Expected Output Format
+
+The model generates structured responses in this format:
+
+```xml
+<think>
+[Internal reasoning about the legal question]
+</think>
+
+<answer>
+<major_premise>
+[General legal rule or principle applicable to the situation]
+</major_premise>
+
+<minor_premise>
+[Specific facts from the question that relate to the legal rule]
+</minor_premise>
+
+<conclusion>
+[Legal conclusion that follows logically from applying the rule to the facts]
+</conclusion>
+</answer>
+```
+
+## 🎯 Use Cases
+
+- **Legal Education**: Teaching legal reasoning methodology
+- **Legal Research**: Preliminary analysis of legal questions
+- **Document Drafting**: Structured legal argument generation
+- **Legal Consultation**: Initial legal guidance (with human review)
+
+## ⚠️ Limitations
+
+- **Domain Specific**: Optimized for Vietnamese legal context
+- **Educational Purpose**: Should not replace professional legal advice
+- **Fact Checking Required**: Always verify legal citations and conclusions
+- **Context Window**: Limited by base model's context length
+
+## 📄 Citation
+
+If you use this model, please cite:
+
+```bibtex
+@misc{vietnamese-legal-grpo-2024,
+  title={Vietnamese Legal Reasoning Model with GRPO},
+  author={Your Name},
+  year={2024},
+  publisher={Hugging Face},
+  url={https://huggingface.co/thangvip/qwen3-4b-vietnamese-legal-grpo}
+}
+```
+
+## 🤝 Contributing
+
+Contributions are welcome! Please see our [contributing guidelines](CONTRIBUTING.md).
+
+## 📜 License
+
+This model is released under the Apache 2.0 License.
+
+## 🙏 Acknowledgments
+
+- **TRL Team**: For the GRPO implementation
+- **Qwen Team**: For the excellent base model
+- **Hugging Face**: For the transformers library and model hosting
+
+---
+
+**Note**: This model is for educational and research purposes. Always consult qualified legal professionals for actual legal advice.