Files

ModelHub XC 25ac8357f8 初始化项目，由ModelHub XC社区提供模型

Model: thanhdo881/qwen2.5-3b-vivu-travel-vn
Source: Original Platform

2026-05-04 12:36:27 +08:00

2.3 KiB

Raw Permalink Blame History

language, pipeline_tag, tags, base_model

language

pipeline_tag

qwen2.5-3b-vivu-travel-vn

Overview

qwen2.5-3b-vivu-travel-vn is a 3B-parameter Small Language Model (SLM) fine-tuned for the Vietnamese Tourism Domain. Built on Qwen2.5-3B-Instruct using Unsloth (PEFT/LoRA), it acts as ViVu, an intelligent travel assistant optimized for Advanced RAG pipelines.

Key Features

Strict Anti-Hallucination: Zero-tolerance for fabrication; strictly grounds answers in the retrieved context and politely declines out-of-scope queries.
RAG-Optimized: Perfectly synthesizes Vector DB chunks into clean, structured Vietnamese (Markdown supported).
Resource Efficient: Deployable on consumer-grade GPUs (e.g., RTX 3060, T4) with low VRAM footprint.

Model Details

Base Model: Qwen/Qwen2.5-3B-Instruct
Architecture: Causal LM, 32k context length.
Training Method: LoRA Instruction-tuning via Unsloth.
Language: Vietnamese, English.

Quickstart

pip install transformers vllm accelerate

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "thanhdo881/qwen2.5-3b-vivu-travel-vn" 

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 1. Prepare RAG Context & Query
context = "Đà Lạt nằm trên cao nguyên Lâm Viên, nổi tiếng với khí hậu ôn đới và Hồ Xuân Hương."
question = "Đà Lạt có những đặc điểm gì nổi bật?"
prompt = f"Dựa vào thông tin sau:\n{context}\n\nHãy trả lời câu hỏi: {question}"

# 2. Build Messages
messages = [
    {"role": "system", "content": "Bạn là ViVu, trợ lý du lịch Việt Nam. Chỉ trả lời dựa trên ngữ cảnh được cung cấp."},
    {"role": "user", "content": prompt}
]

# 3. Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3, repetition_penalty=1.1)
response = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]

print(response)