Model: thanhdo881/qwen2.5-3b-vivu-travel-vn Source: Original Platform
language, pipeline_tag, tags, base_model
| language | pipeline_tag | tags | base_model | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
text-generation |
|
Qwen/Qwen2.5-3B-Instruct |
qwen2.5-3b-vivu-travel-vn
Overview
qwen2.5-3b-vivu-travel-vn is a 3B-parameter Small Language Model (SLM) fine-tuned for the Vietnamese Tourism Domain. Built on Qwen2.5-3B-Instruct using Unsloth (PEFT/LoRA), it acts as ViVu, an intelligent travel assistant optimized for Advanced RAG pipelines.
Key Features
- Strict Anti-Hallucination: Zero-tolerance for fabrication; strictly grounds answers in the retrieved context and politely declines out-of-scope queries.
- RAG-Optimized: Perfectly synthesizes Vector DB chunks into clean, structured Vietnamese (Markdown supported).
- Resource Efficient: Deployable on consumer-grade GPUs (e.g., RTX 3060, T4) with low VRAM footprint.
Model Details
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Architecture: Causal LM, 32k context length.
- Training Method: LoRA Instruction-tuning via Unsloth.
- Language: Vietnamese, English.
Quickstart
pip install transformers vllm accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "thanhdo881/qwen2.5-3b-vivu-travel-vn"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 1. Prepare RAG Context & Query
context = "Đà Lạt nằm trên cao nguyên Lâm Viên, nổi tiếng với khí hậu ôn đới và Hồ Xuân Hương."
question = "Đà Lạt có những đặc điểm gì nổi bật?"
prompt = f"Dựa vào thông tin sau:\n{context}\n\nHãy trả lời câu hỏi: {question}"
# 2. Build Messages
messages = [
{"role": "system", "content": "Bạn là ViVu, trợ lý du lịch Việt Nam. Chỉ trả lời dựa trên ngữ cảnh được cung cấp."},
{"role": "user", "content": prompt}
]
# 3. Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, repetition_penalty=1.1)
response = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
print(response)
Description
Languages
Jinja
100%