Files
tourism-llm-fine-tuned-qwen…/README.md
ModelHub XC 609bccf4ab 初始化项目,由ModelHub XC社区提供模型
Model: yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged
Source: Original Platform
2026-05-26 15:22:55 +08:00

14 KiB
Raw Blame History

library_name, license, datasets, language, base_model, tags
library_name license datasets language base_model tags
transformers mit
yowww1094/tourism-llm-fine-tuning-dataset
en
Qwen/Qwen2.5-1.5B-Instruct
tourism
travel
question-answering
lora
fine-tuned
rag

Tourism Assistant — Qwen2.5-1.5B Fine-Tuned

A fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct trained on a custom tourism Q&A dataset generated through a RAG-grounded pipeline. Built as a personal end-to-end learning project covering data collection, dataset engineering, supervised fine-tuning, and deployment.

⚠️ This model is not production-ready. It is a learning project. Outputs can be incorrect, incomplete, or inconsistent — especially on topics not well-represented in the training data. Do not rely on this model for real travel decisions.


Model Details

Model Description

This model was fine-tuned using FP16 LoRA (Low-Rank Adaptation) on a small custom dataset of ~150200 tourism-focused question-answer pairs. The training examples were generated through a RAG pipeline: Reddit travel posts were scraped, embedded, and stored in a Qdrant vector store, then a locally-hosted Qwen2.5-7B (via Ollama) was used to generate grounded Q&A pairs from retrieved context chunks. The resulting dataset was formatted in ChatML and used to fine-tune this model.

The goal of fine-tuning was to adjust the model's behavioral style — making it more focused, concise, and consistently helpful for travel queries — rather than to inject new factual knowledge. Factual grounding at inference time is handled by a RAG pipeline backed by Qdrant Cloud.

Model Sources


Uses

Direct Use

This model can be used as a conversational assistant for general tourism and travel questions — destination information, logistics, visa guidance, packing advice, and similar topics. It works best when paired with a retrieval pipeline that provides relevant context at inference time.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "yowww1094/tourism-assistant-qwen2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant."},
    {"role": "user", "content": "What are the must-visit places in Marrakech?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Downstream Use

This model is designed to be used as the generation component of a RAG pipeline. The recommended usage is:

  1. Embed the user query with all-MiniLM-L6-v2
  2. Retrieve top-k relevant chunks from a Qdrant vector store
  3. Inject retrieved context into the prompt before generation
  4. Pass the full prompt to this model

Pairing the model with retrieval significantly improves factual accuracy on specific travel queries compared to using the model standalone.

Out-of-Scope Use

  • Real-time travel information: The model has no access to live data. Flight prices, visa requirements, and safety conditions change frequently — do not rely on this model for current information.
  • Medical or legal travel advice: The model is not equipped to give reliable guidance on health requirements, legal restrictions, or emergency situations.
  • Non-English queries: The model was trained exclusively on English data and is not reliable for other languages.
  • High-stakes decisions: This is a learning project. Outputs should not be used to make actual travel bookings, visa applications, or safety assessments.

Bias, Risks, and Limitations

Dataset limitations — primary source of inaccuracy: The training dataset contains only ~150200 examples sourced from Reddit travel communities. This introduces several compounding problems:

  • Reddit's user base skews toward English-speaking, Western travellers — advice reflects this demographic and may not generalise to other travel styles or origins
  • The dataset covers only a narrow slice of the tourism domain; large topic areas have no representation
  • Source Reddit posts were not independently fact-checked; incorrect or outdated community advice may appear in training examples
  • Only ~40 examples (~20%) were manually reviewed for quality before training

Model behaviour limitations:

  • The model hallucinates specific facts (prices, distances, operating hours, visa fees) when retrieved context does not provide explicit grounding
  • It does not reliably abstain from answering when it does not know — it tends to produce a confident-sounding response regardless
  • Response consistency is variable; the same query may produce meaningfully different answers across runs
  • The model may reflect biases present in Reddit data, including opinions presented as facts

Recommendations

  • Always pair this model with a retrieval pipeline for factual queries
  • Display a disclaimer to end users that outputs may be inaccurate
  • Do not deploy in contexts where incorrect travel information could cause harm
  • Treat all specific factual claims (prices, hours, requirements) as unverified until confirmed from an authoritative source

How to Get Started with the Model

Basic inference (no RAG)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "yowww1094/tourism-assistant-qwen2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant. Answer travel questions clearly and concisely."},
    {"role": "user", "content": "What is the best time of year to visit Morocco?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1
    )

response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# retrieved_context = top-k chunks from your Qdrant vector store

system_prompt = "You are a helpful tourism assistant. Use only the provided context to answer. If the context does not contain enough information, say so."

user_prompt = f"""Context:
{retrieved_context}

Question: {user_query}"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# then apply chat template and generate as above

Training Details

Training Data

The model was fine-tuned on yowww1094/tourism-llm-fine-tuning-dataset, a custom dataset generated through the following pipeline:

  1. Reddit travel posts and comments were scraped from subreddits including r/travel, r/Morocco, r/solotravel, and r/backpacking using PRAW
  2. Raw text was cleaned (length filter, deduplication, HTML stripping, encoding fix) and chunked
  3. Chunks were embedded with all-MiniLM-L6-v2 and indexed in a Qdrant vector store
  4. A locally-hosted Qwen2.5-7B (via Ollama) was prompted to generate grounded question-answer pairs from retrieved context chunks
  5. Outputs were post-processed into ChatML format and manually sampled for quality (~40 examples reviewed)

Dataset size: ~150200 question-answer pairs
Format: JSON Lines
Split: 90% train / 10% validation
Quality note: Only ~65% of manually reviewed examples were rated acceptable. No automated quality filter was applied to the full dataset.

Training Procedure

Preprocessing

Each example was formatted using the Qwen2.5-Instruct chat template via tokenizer.apply_chat_template(). Sequences were truncated to a maximum length of 512 tokens. No data augmentation was applied.

Training Hyperparameters

Parameter Value
Training regime FP16 mixed precision
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, v_proj
Epochs 3
Batch size 4
Gradient accumulation steps 4 (effective batch size: 16)
Learning rate 2e-4
LR scheduler Cosine annealing
Warmup ratio 0.03
Max sequence length 512 tokens
Optimizer AdamW (default via transformers)

Hardware and Duration

  • Hardware: NVIDIA T4 GPU (15 GB VRAM) — Kaggle free tier
  • Training time: ~4560 minutes for 3 epochs on ~150 examples
  • Framework: transformers 4.x + peft + trl (SFTTrainer) + accelerate

Evaluation

Testing Data

Evaluation was performed on a held-out validation split of ~1520 examples (10% of the total dataset). Due to the small size of this split, quantitative metrics should be interpreted with significant caution — they are not statistically reliable estimates of generalisation performance.

In addition to automatic metrics, a qualitative evaluation was performed by manually inspecting model outputs on ~20 held-out queries not present in the training data.

Metrics

Metric Value Notes
Training loss (epoch 1) ~1.62
Training loss (epoch 3) ~0.85 Consistent decrease across epochs
Validation loss (epoch 1) ~1.74
Validation loss (epoch 3) ~1.29 Slight divergence from train loss — mild overfitting
Validation perplexity ~2.3 On 1520 examples only; not a reliable generalisation estimate

Results

What improved after fine-tuning:

  • Response style and tone became more focused and consistently helpful compared to the base model
  • The model more readily uses tourism-relevant vocabulary and response structure
  • For in-distribution queries (topics well-represented in the dataset), the combination of fine-tuning + RAG outperforms either mechanism alone

What did not improve:

  • Factual accuracy on out-of-distribution queries is comparable to the base model — fine-tuning at this scale does not inject meaningful new factual knowledge
  • Hallucination rate on specific facts (prices, dates, requirements) is unchanged without retrieval grounding
  • The model occasionally produces responses that closely paraphrase training examples, suggesting partial memorisation

Summary

Fine-tuning on this small dataset improved behavioral style but not factual coverage. The model is more useful when paired with a retrieval pipeline than when used standalone. A dataset 1020× larger with verified factual content would be needed to produce a genuinely reliable tourism assistant.


Environmental Impact

Carbon emissions were not formally measured. Estimated figures based on training setup:

  • Hardware type: NVIDIA T4 GPU (Kaggle free tier)
  • Hours used: ~1 hour total training time
  • Cloud provider: Google (Kaggle infrastructure)
  • Compute region: Unknown
  • Carbon emitted: Estimated < 0.05 kg CO₂eq (based on ML Impact Calculator)

Technical Specifications

Model Architecture

  • Architecture: Qwen2.5 decoder-only transformer (causal LM)
  • Parameters: 1.5 billion (base model)
  • Context window: 32,768 tokens (base model capability; fine-tuning used max 512 tokens)
  • Fine-tuning method: LoRA adapters applied to q_proj and v_proj attention matrices, then merged into base weights before upload

Compute Infrastructure

  • Training: Kaggle free-tier notebook, T4 GPU, 15 GB VRAM
  • Inference: Hugging Face Inference Endpoints (free tier)
  • Vector store: Qdrant Cloud (free tier, ~1,200 indexed chunks)
  • Embedding: sentence-transformers/all-MiniLM-L6-v2, CPU inference

Software

transformers>=4.40.0
peft>=0.10.0
trl>=0.8.0
accelerate>=0.28.0
sentence-transformers>=2.6.0
qdrant-client>=1.9.0
torch>=2.1.0

Citation

If you reference this project, please cite it as:

@misc{younes2026tourismllm,
  author    = {Younes},
  title     = {End-to-End LLM Pipeline for Tourism Assistant},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {[Model](https://huggingface.co/yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged)}
}

More Information

Full technical report (covering pipeline design, training decisions, limitations, and results in detail): docs/technical_report.pdf

GitHub repository with full pipeline code: Github

Training dataset: yowww1094/tourism-llm-fine-tuning-dataset


Model Card Author

Younes AIT SI ABBOU — personal learning project, April 2026