tourism-llm-fine-tuned-qwen…/README.md

---
library_name: transformers
license: mit
datasets:
- yowww1094/tourism-llm-fine-tuning-dataset
language:
- en
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
tags:
- tourism
- travel
- question-answering
- lora
- fine-tuned
- rag
---

# Tourism Assistant — Qwen2.5-1.5B Fine-Tuned

A fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) trained on a custom tourism Q&A dataset generated through a RAG-grounded pipeline. Built as a personal end-to-end learning project covering data collection, dataset engineering, supervised fine-tuning, and deployment.

> ⚠️ This model is not production-ready. It is a learning project. Outputs can be incorrect, incomplete, or inconsistent — especially on topics not well-represented in the training data. Do not rely on this model for real travel decisions.

---

## Model Details

### Model Description

This model was fine-tuned using FP16 LoRA (Low-Rank Adaptation) on a small custom dataset of ~150–200 tourism-focused question-answer pairs. The training examples were generated through a RAG pipeline: Reddit travel posts were scraped, embedded, and stored in a Qdrant vector store, then a locally-hosted Qwen2.5-7B (via Ollama) was used to generate grounded Q&A pairs from retrieved context chunks. The resulting dataset was formatted in ChatML and used to fine-tune this model.

The goal of fine-tuning was to adjust the model's **behavioral style** — making it more focused, concise, and consistently helpful for travel queries — rather than to inject new factual knowledge. Factual grounding at inference time is handled by a RAG pipeline backed by Qdrant Cloud.

- **Developed by:** Younes
- **Model type:** Causal Language Model — fine-tuned for instruction following
- **Language:** English
- **License:** MIT
- **Base model:** [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)
- **Fine-tuning method:** FP16 LoRA (merged into base weights)
- **Training dataset:** [yowww1094/tourism-llm-fine-tuning-dataset](https://huggingface.co/datasets/yowww1094/tourism-llm-fine-tuning-dataset)

### Model Sources

- **Repository:** [Github](https://github.com/yowww1094/llm-tourism-assistant)
- **Demo:** [Demo](https://huggingface.co/spaces/yowww1094/AI-tourism-chatbot)

---

## Uses

### Direct Use

This model can be used as a conversational assistant for general tourism and travel questions — destination information, logistics, visa guidance, packing advice, and similar topics. It works best when paired with a retrieval pipeline that provides relevant context at inference time.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "yowww1094/tourism-assistant-qwen2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant."},
    {"role": "user", "content": "What are the must-visit places in Marrakech?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
```

### Downstream Use

This model is designed to be used as the generation component of a RAG pipeline. The recommended usage is:

1. Embed the user query with `all-MiniLM-L6-v2`
2. Retrieve top-k relevant chunks from a Qdrant vector store
3. Inject retrieved context into the prompt before generation
4. Pass the full prompt to this model

Pairing the model with retrieval significantly improves factual accuracy on specific travel queries compared to using the model standalone.

### Out-of-Scope Use

- **Real-time travel information:** The model has no access to live data. Flight prices, visa requirements, and safety conditions change frequently — do not rely on this model for current information.
- **Medical or legal travel advice:** The model is not equipped to give reliable guidance on health requirements, legal restrictions, or emergency situations.
- **Non-English queries:** The model was trained exclusively on English data and is not reliable for other languages.
- **High-stakes decisions:** This is a learning project. Outputs should not be used to make actual travel bookings, visa applications, or safety assessments.

---

## Bias, Risks, and Limitations

**Dataset limitations — primary source of inaccuracy:**
The training dataset contains only ~150–200 examples sourced from Reddit travel communities. This introduces several compounding problems:

- Reddit's user base skews toward English-speaking, Western travellers — advice reflects this demographic and may not generalise to other travel styles or origins
- The dataset covers only a narrow slice of the tourism domain; large topic areas have no representation
- Source Reddit posts were not independently fact-checked; incorrect or outdated community advice may appear in training examples
- Only ~40 examples (~20%) were manually reviewed for quality before training

**Model behaviour limitations:**
- The model hallucinates specific facts (prices, distances, operating hours, visa fees) when retrieved context does not provide explicit grounding
- It does not reliably abstain from answering when it does not know — it tends to produce a confident-sounding response regardless
- Response consistency is variable; the same query may produce meaningfully different answers across runs
- The model may reflect biases present in Reddit data, including opinions presented as facts

### Recommendations

- Always pair this model with a retrieval pipeline for factual queries
- Display a disclaimer to end users that outputs may be inaccurate
- Do not deploy in contexts where incorrect travel information could cause harm
- Treat all specific factual claims (prices, hours, requirements) as unverified until confirmed from an authoritative source

---

## How to Get Started with the Model

### Basic inference (no RAG)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "yowww1094/tourism-assistant-qwen2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful tourism assistant. Answer travel questions clearly and concisely."},
    {"role": "user", "content": "What is the best time of year to visit Morocco?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=1.1
    )

response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
```

### Recommended inference (with RAG context)

```python
# retrieved_context = top-k chunks from your Qdrant vector store

system_prompt = "You are a helpful tourism assistant. Use only the provided context to answer. If the context does not contain enough information, say so."

user_prompt = f"""Context:
{retrieved_context}

Question: {user_query}"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# then apply chat template and generate as above
```

---

## Training Details

### Training Data

The model was fine-tuned on [yowww1094/tourism-llm-fine-tuning-dataset](https://huggingface.co/datasets/yowww1094/tourism-llm-fine-tuning-dataset), a custom dataset generated through the following pipeline:

1. Reddit travel posts and comments were scraped from subreddits including r/travel, r/Morocco, r/solotravel, and r/backpacking using PRAW
2. Raw text was cleaned (length filter, deduplication, HTML stripping, encoding fix) and chunked
3. Chunks were embedded with `all-MiniLM-L6-v2` and indexed in a Qdrant vector store
4. A locally-hosted Qwen2.5-7B (via Ollama) was prompted to generate grounded question-answer pairs from retrieved context chunks
5. Outputs were post-processed into ChatML format and manually sampled for quality (~40 examples reviewed)

**Dataset size:** ~150–200 question-answer pairs
**Format:** JSON Lines
**Split:** 90% train / 10% validation
**Quality note:** Only ~65% of manually reviewed examples were rated acceptable. No automated quality filter was applied to the full dataset.

### Training Procedure

#### Preprocessing

Each example was formatted using the Qwen2.5-Instruct chat template via `tokenizer.apply_chat_template()`. Sequences were truncated to a maximum length of 512 tokens. No data augmentation was applied.

#### Training Hyperparameters

| Parameter | Value |
|---|---|
| Training regime | FP16 mixed precision |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, v_proj |
| Epochs | 3 |
| Batch size | 4 |
| Gradient accumulation steps | 4 (effective batch size: 16) |
| Learning rate | 2e-4 |
| LR scheduler | Cosine annealing |
| Warmup ratio | 0.03 |
| Max sequence length | 512 tokens |
| Optimizer | AdamW (default via transformers) |

#### Hardware and Duration

- **Hardware:** NVIDIA T4 GPU (15 GB VRAM) — Kaggle free tier
- **Training time:** ~45–60 minutes for 3 epochs on ~150 examples
- **Framework:** transformers 4.x + peft + trl (SFTTrainer) + accelerate

---

## Evaluation

### Testing Data

Evaluation was performed on a held-out validation split of ~15–20 examples (10% of the total dataset). Due to the small size of this split, quantitative metrics should be interpreted with significant caution — they are not statistically reliable estimates of generalisation performance.

In addition to automatic metrics, a qualitative evaluation was performed by manually inspecting model outputs on ~20 held-out queries not present in the training data.

### Metrics

| Metric | Value | Notes |
|---|---|---|
| Training loss (epoch 1) | ~1.62 | |
| Training loss (epoch 3) | ~0.85 | Consistent decrease across epochs |
| Validation loss (epoch 1) | ~1.74 | |
| Validation loss (epoch 3) | ~1.29 | Slight divergence from train loss — mild overfitting |
| Validation perplexity | ~2.3 | On 15–20 examples only; not a reliable generalisation estimate |

### Results

**What improved after fine-tuning:**
- Response style and tone became more focused and consistently helpful compared to the base model
- The model more readily uses tourism-relevant vocabulary and response structure
- For in-distribution queries (topics well-represented in the dataset), the combination of fine-tuning + RAG outperforms either mechanism alone

**What did not improve:**
- Factual accuracy on out-of-distribution queries is comparable to the base model — fine-tuning at this scale does not inject meaningful new factual knowledge
- Hallucination rate on specific facts (prices, dates, requirements) is unchanged without retrieval grounding
- The model occasionally produces responses that closely paraphrase training examples, suggesting partial memorisation

#### Summary

Fine-tuning on this small dataset improved behavioral style but not factual coverage. The model is more useful when paired with a retrieval pipeline than when used standalone. A dataset 10–20× larger with verified factual content would be needed to produce a genuinely reliable tourism assistant.

---

## Environmental Impact

Carbon emissions were not formally measured. Estimated figures based on training setup:

- **Hardware type:** NVIDIA T4 GPU (Kaggle free tier)
- **Hours used:** ~1 hour total training time
- **Cloud provider:** Google (Kaggle infrastructure)
- **Compute region:** Unknown
- **Carbon emitted:** Estimated < 0.05 kg CO₂eq (based on [ML Impact Calculator](https://mlco2.github.io/impact#compute))

---

## Technical Specifications

### Model Architecture

- **Architecture:** Qwen2.5 decoder-only transformer (causal LM)
- **Parameters:** 1.5 billion (base model)
- **Context window:** 32,768 tokens (base model capability; fine-tuning used max 512 tokens)
- **Fine-tuning method:** LoRA adapters applied to `q_proj` and `v_proj` attention matrices, then merged into base weights before upload

### Compute Infrastructure

- **Training:** Kaggle free-tier notebook, T4 GPU, 15 GB VRAM
- **Inference:** Hugging Face Inference Endpoints (free tier)
- **Vector store:** Qdrant Cloud (free tier, ~1,200 indexed chunks)
- **Embedding:** `sentence-transformers/all-MiniLM-L6-v2`, CPU inference

### Software

```
transformers>=4.40.0
peft>=0.10.0
trl>=0.8.0
accelerate>=0.28.0
sentence-transformers>=2.6.0
qdrant-client>=1.9.0
torch>=2.1.0
```

---

## Citation

If you reference this project, please cite it as:

```bibtex
@misc{younes2026tourismllm,
  author    = {Younes},
  title     = {End-to-End LLM Pipeline for Tourism Assistant},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {[Model](https://huggingface.co/yowww1094/tourism-llm-fine-tuned-qwen2-1.5b-lora-merged)}
}
```

---

## More Information

Full technical report (covering pipeline design, training decisions, limitations, and results in detail):
[docs/technical_report.pdf](https://github.com/yowww1094/llm-tourism-assistant/blob/master/docs/technical_report.pdf)

GitHub repository with full pipeline code:
[Github](https://github.com/yowww1094/llm-tourism-assistant)

Training dataset:
[yowww1094/tourism-llm-fine-tuning-dataset](https://huggingface.co/datasets/yowww1094/tourism-llm-fine-tuning-dataset)

---

## Model Card Author

Younes AIT SI ABBOU — personal learning project, April 2026