--- language: - vi - en pipeline_tag: text-generation tags: - qwen - qwen2.5 - slm - RAG - travel - vietnamese - unsloth - anti-hallucination base_model: Qwen/Qwen2.5-3B-Instruct --- # qwen2.5-3b-vivu-travel-vn ## Overview `qwen2.5-3b-vivu-travel-vn` is a 3B-parameter Small Language Model (SLM) fine-tuned for the **Vietnamese Tourism Domain**. Built on `Qwen2.5-3B-Instruct` using Unsloth (PEFT/LoRA), it acts as **ViVu**, an intelligent travel assistant optimized for **Advanced RAG** pipelines. ### Key Features * **Strict Anti-Hallucination:** Zero-tolerance for fabrication; strictly grounds answers in the retrieved context and politely declines out-of-scope queries. * **RAG-Optimized:** Perfectly synthesizes Vector DB chunks into clean, structured Vietnamese (Markdown supported). * **Resource Efficient:** Deployable on consumer-grade GPUs (e.g., RTX 3060, T4) with low VRAM footprint. ## Model Details * **Base Model:** Qwen/Qwen2.5-3B-Instruct * **Architecture:** Causal LM, 32k context length. * **Training Method:** LoRA Instruction-tuning via Unsloth. * **Language:** Vietnamese, English. ## Quickstart ```bash pip install transformers vllm accelerate from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "thanhdo881/qwen2.5-3b-vivu-travel-vn" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name) # 1. Prepare RAG Context & Query context = "Đà Lạt nằm trên cao nguyên Lâm Viên, nổi tiếng với khí hậu ôn đới và Hồ Xuân Hương." question = "Đà Lạt có những đặc điểm gì nổi bật?" prompt = f"Dựa vào thông tin sau:\n{context}\n\nHãy trả lời câu hỏi: {question}" # 2. Build Messages messages = [ {"role": "system", "content": "Bạn là ViVu, trợ lý du lịch Việt Nam. Chỉ trả lời dựa trên ngữ cảnh được cung cấp."}, {"role": "user", "content": prompt} ] # 3. Generate text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, repetition_penalty=1.1) response = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0] print(response)