qwen2.5-3b-vivu-travel-vn/README.md

---
language:
- vi
- en
pipeline_tag: text-generation
tags:
- qwen
- qwen2.5
- slm
- RAG
- travel
- vietnamese
- unsloth
- anti-hallucination
base_model: Qwen/Qwen2.5-3B-Instruct
---

# qwen2.5-3b-vivu-travel-vn

## Overview
`qwen2.5-3b-vivu-travel-vn` is a 3B-parameter Small Language Model (SLM) fine-tuned for the **Vietnamese Tourism Domain**. Built on `Qwen2.5-3B-Instruct` using Unsloth (PEFT/LoRA), it acts as **ViVu**, an intelligent travel assistant optimized for **Advanced RAG** pipelines.

### Key Features
* **Strict Anti-Hallucination:** Zero-tolerance for fabrication; strictly grounds answers in the retrieved context and politely declines out-of-scope queries.
* **RAG-Optimized:** Perfectly synthesizes Vector DB chunks into clean, structured Vietnamese (Markdown supported).
* **Resource Efficient:** Deployable on consumer-grade GPUs (e.g., RTX 3060, T4) with low VRAM footprint.

## Model Details
* **Base Model:** Qwen/Qwen2.5-3B-Instruct
* **Architecture:** Causal LM, 32k context length.
* **Training Method:** LoRA Instruction-tuning via Unsloth.
* **Language:** Vietnamese, English.

## Quickstart

```bash
pip install transformers vllm accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "thanhdo881/qwen2.5-3b-vivu-travel-vn"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 1. Prepare RAG Context & Query
context = "Đà Lạt nằm trên cao nguyên Lâm Viên, nổi tiếng với khí hậu ôn đới và Hồ Xuân Hương."
question = "Đà Lạt có những đặc điểm gì nổi bật?"
prompt = f"Dựa vào thông tin sau:\n{context}\n\nHãy trả lời câu hỏi: {question}"

# 2. Build Messages
messages = [
    {"role": "system", "content": "Bạn là ViVu, trợ lý du lịch Việt Nam. Chỉ trả lời dựa trên ngữ cảnh được cung cấp."},
    {"role": "user", "content": prompt}
]

# 3. Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, repetition_penalty=1.1)
response = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]

print(response)