yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged

Files

ModelHub XC 609bccf4ab 初始化项目，由ModelHub XC社区提供模型

Model: yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged
Source: Original Platform

2026-05-26 15:22:55 +08:00

14 KiB

Raw Blame History

library_name, license, datasets, language, base_model, tags

library_name

license

datasets

language

base_model

Tourism Assistant — Qwen2.5-1.5B Fine-Tuned

A fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct trained on a custom tourism Q&A dataset generated through a RAG-grounded pipeline. Built as a personal end-to-end learning project covering data collection, dataset engineering, supervised fine-tuning, and deployment.

⚠️ This model is not production-ready. It is a learning project. Outputs can be incorrect, incomplete, or inconsistent — especially on topics not well-represented in the training data. Do not rely on this model for real travel decisions.

Model Details

Model Description

This model was fine-tuned using FP16 LoRA (Low-Rank Adaptation) on a small custom dataset of ~150–200 tourism-focused question-answer pairs. The training examples were generated through a RAG pipeline: Reddit travel posts were scraped, embedded, and stored in a Qdrant vector store, then a locally-hosted Qwen2.5-7B (via Ollama) was used to generate grounded Q&A pairs from retrieved context chunks. The resulting dataset was formatted in ChatML and used to fine-tune this model.

The goal of fine-tuning was to adjust the model's behavioral style — making it more focused, concise, and consistently helpful for travel queries — rather than to inject new factual knowledge. Factual grounding at inference time is handled by a RAG pipeline backed by Qdrant Cloud.

Developed by: Younes
Model type: Causal Language Model — fine-tuned for instruction following
Language: English
License: MIT
Base model: Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning method: FP16 LoRA (merged into base weights)
Training dataset: yowww1094/tourism-llm-fine-tuning-dataset

Model Sources

Repository: Github
Demo: Demo

Uses

Direct Use

This model can be used as a conversational assistant for general tourism and travel questions — destination information, logistics, visa guidance, packing advice, and similar topics. It works best when paired with a retrieval pipeline that provides relevant context at inference time.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "yowww1094/tourism-assistant-qwen2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant."},
    {"role": "user", "content": "What are the must-visit places in Marrakech?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Downstream Use

This model is designed to be used as the generation component of a RAG pipeline. The recommended usage is:

Embed the user query with all-MiniLM-L6-v2
Retrieve top-k relevant chunks from a Qdrant vector store
Inject retrieved context into the prompt before generation
Pass the full prompt to this model

Pairing the model with retrieval significantly improves factual accuracy on specific travel queries compared to using the model standalone.

Out-of-Scope Use

Real-time travel information: The model has no access to live data. Flight prices, visa requirements, and safety conditions change frequently — do not rely on this model for current information.
Medical or legal travel advice: The model is not equipped to give reliable guidance on health requirements, legal restrictions, or emergency situations.
Non-English queries: The model was trained exclusively on English data and is not reliable for other languages.
High-stakes decisions: This is a learning project. Outputs should not be used to make actual travel bookings, visa applications, or safety assessments.

Bias, Risks, and Limitations

Dataset limitations — primary source of inaccuracy: The training dataset contains only ~150–200 examples sourced from Reddit travel communities. This introduces several compounding problems:

Reddit's user base skews toward English-speaking, Western travellers — advice reflects this demographic and may not generalise to other travel styles or origins
The dataset covers only a narrow slice of the tourism domain; large topic areas have no representation
Source Reddit posts were not independently fact-checked; incorrect or outdated community advice may appear in training examples
Only ~40 examples (~20%) were manually reviewed for quality before training

Model behaviour limitations:

The model hallucinates specific facts (prices, distances, operating hours, visa fees) when retrieved context does not provide explicit grounding
It does not reliably abstain from answering when it does not know — it tends to produce a confident-sounding response regardless
Response consistency is variable; the same query may produce meaningfully different answers across runs
The model may reflect biases present in Reddit data, including opinions presented as facts

Recommendations

Always pair this model with a retrieval pipeline for factual queries
Display a disclaimer to end users that outputs may be inaccurate
Do not deploy in contexts where incorrect travel information could cause harm
Treat all specific factual claims (prices, hours, requirements) as unverified until confirmed from an authoritative source

How to Get Started with the Model

Basic inference (no RAG)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "yowww1094/tourism-assistant-qwen2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant. Answer travel questions clearly and concisely."},
    {"role": "user", "content": "What is the best time of year to visit Morocco?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1
    )

response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Recommended inference (with RAG context)

# retrieved_context = top-k chunks from your Qdrant vector store

system_prompt = "You are a helpful tourism assistant. Use only the provided context to answer. If the context does not contain enough information, say so."

user_prompt = f"""Context:
{retrieved_context}

Question: {user_query}"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# then apply chat template and generate as above

Training Details

Training Data

The model was fine-tuned on yowww1094/tourism-llm-fine-tuning-dataset, a custom dataset generated through the following pipeline:

Reddit travel posts and comments were scraped from subreddits including r/travel, r/Morocco, r/solotravel, and r/backpacking using PRAW
Raw text was cleaned (length filter, deduplication, HTML stripping, encoding fix) and chunked
Chunks were embedded with all-MiniLM-L6-v2 and indexed in a Qdrant vector store
A locally-hosted Qwen2.5-7B (via Ollama) was prompted to generate grounded question-answer pairs from retrieved context chunks
Outputs were post-processed into ChatML format and manually sampled for quality (~40 examples reviewed)

Dataset size: ~150–200 question-answer pairs
Format: JSON Lines
Split: 90% train / 10% validation
Quality note: Only ~65% of manually reviewed examples were rated acceptable. No automated quality filter was applied to the full dataset.

Training Procedure

Preprocessing

Each example was formatted using the Qwen2.5-Instruct chat template via tokenizer.apply_chat_template(). Sequences were truncated to a maximum length of 512 tokens. No data augmentation was applied.

Training Hyperparameters

Parameter	Value
Training regime	FP16 mixed precision
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	q_proj, v_proj
Epochs	3
Batch size	4
Gradient accumulation steps	4 (effective batch size: 16)
Learning rate	2e-4
LR scheduler	Cosine annealing
Warmup ratio	0.03
Max sequence length	512 tokens
Optimizer	AdamW (default via transformers)

Hardware and Duration

Hardware: NVIDIA T4 GPU (15 GB VRAM) — Kaggle free tier
Training time: ~45–60 minutes for 3 epochs on ~150 examples
Framework: transformers 4.x + peft + trl (SFTTrainer) + accelerate

Evaluation

Testing Data

Evaluation was performed on a held-out validation split of ~15–20 examples (10% of the total dataset). Due to the small size of this split, quantitative metrics should be interpreted with significant caution — they are not statistically reliable estimates of generalisation performance.

In addition to automatic metrics, a qualitative evaluation was performed by manually inspecting model outputs on ~20 held-out queries not present in the training data.

Metrics

Metric	Value	Notes
Training loss (epoch 1)	~1.62
Training loss (epoch 3)	~0.85	Consistent decrease across epochs
Validation loss (epoch 1)	~1.74
Validation loss (epoch 3)	~1.29	Slight divergence from train loss — mild overfitting
Validation perplexity	~2.3	On 15–20 examples only; not a reliable generalisation estimate

Results

What improved after fine-tuning:

Response style and tone became more focused and consistently helpful compared to the base model
The model more readily uses tourism-relevant vocabulary and response structure
For in-distribution queries (topics well-represented in the dataset), the combination of fine-tuning + RAG outperforms either mechanism alone

What did not improve:

Factual accuracy on out-of-distribution queries is comparable to the base model — fine-tuning at this scale does not inject meaningful new factual knowledge
Hallucination rate on specific facts (prices, dates, requirements) is unchanged without retrieval grounding
The model occasionally produces responses that closely paraphrase training examples, suggesting partial memorisation

Summary

Fine-tuning on this small dataset improved behavioral style but not factual coverage. The model is more useful when paired with a retrieval pipeline than when used standalone. A dataset 10–20× larger with verified factual content would be needed to produce a genuinely reliable tourism assistant.

Environmental Impact

Carbon emissions were not formally measured. Estimated figures based on training setup:

Hardware type: NVIDIA T4 GPU (Kaggle free tier)
Hours used: ~1 hour total training time
Cloud provider: Google (Kaggle infrastructure)
Compute region: Unknown
Carbon emitted: Estimated < 0.05 kg CO₂eq (based on ML Impact Calculator)

Technical Specifications

Model Architecture

Architecture: Qwen2.5 decoder-only transformer (causal LM)
Parameters: 1.5 billion (base model)
Context window: 32,768 tokens (base model capability; fine-tuning used max 512 tokens)
Fine-tuning method: LoRA adapters applied to q_proj and v_proj attention matrices, then merged into base weights before upload

Compute Infrastructure

Training: Kaggle free-tier notebook, T4 GPU, 15 GB VRAM
Inference: Hugging Face Inference Endpoints (free tier)
Vector store: Qdrant Cloud (free tier, ~1,200 indexed chunks)
Embedding: sentence-transformers/all-MiniLM-L6-v2, CPU inference

Software

transformers>=4.40.0
peft>=0.10.0
trl>=0.8.0
accelerate>=0.28.0
sentence-transformers>=2.6.0
qdrant-client>=1.9.0
torch>=2.1.0

Citation

If you reference this project, please cite it as:

@misc{younes2026tourismllm,
  author    = {Younes},
  title     = {End-to-End LLM Pipeline for Tourism Assistant},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {[Model](https://huggingface.co/yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged)}
}

More Information

Full technical report (covering pipeline design, training decisions, limitations, and results in detail): docs/technical_report.pdf

GitHub repository with full pipeline code: Github

Training dataset: yowww1094/tourism-llm-fine-tuning-dataset

Model Card Author

Younes AIT SI ABBOU — personal learning project, April 2026

14 KiB Raw Blame History Unescape Escape