chatOP/README.md

---
language:
- en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
tags:
- llm
- fine-tuned
- lora
- sft
- text-generation
- student-project
datasets:
- HuggingFaceTB/smoltalk
pipeline_tag: text-generation
---

# chatOP — SmolLM2-135M Fine-tuned

A fine-tuned version of [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) trained to act as a friendly study assistant for CS and ML concepts.

## Model Details

| | |
|---|---|
| **Base model** | HuggingFaceTB/SmolLM2-135M-Instruct |
| **Model type** | Causal Language Model |
| **Fine-tuning method** | SFT + LoRA |
| **Language** | English |
| **License** | Apache 2.0 |

## Training Details

### Dataset
- **Name:** [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
- **Split used:** `all`
- **Format:** Conversational chat messages (system, user, assistant)

### Hyperparameters

| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Training steps | 200 |
| Batch size | 2 |
| Packing | True |
| Optimizer | AdamW |
| LR scheduler | Linear |
| Precision | fp16 |

### LoRA Config

| Parameter | Value |
|---|---|
| Rank (`r`) | 16 |
| Alpha (`lora_alpha`) | 16 |
| Dropout | 0.05 |
| Target modules | all-linear |
| Bias | none |
| Trainable parameters | ~1.7M / 135M (1.26%) |

### Hardware
- **GPU:** NVIDIA Tesla T4 (Google Colab)
- **Training time:** ~15–20 minutes

## Example Outputs

**Prompt:** What is machine learning?

> Machine learning is a field of AI where systems learn from data to make predictions or decisions without being explicitly programmed for each task.

**Prompt:** Explain transformers in simple terms.

> A Transformer is a neural network architecture that uses attention mechanisms to understand relationships between all words in a sentence simultaneously, rather than reading them one by one like older models did.

**Prompt:** What is the difference between LoRA and full fine-tuning?

> Full fine-tuning updates all parameters of a model which requires a lot of memory. LoRA instead freezes the base model and adds small trainable matrices to each layer, training only ~1% of parameters while achieving similar results.

## Limitations

- Small model (135M parameters) — responses may be less detailed than larger models
- Trained for only 200 steps — longer training would improve quality
- Best for short explanations — may struggle with very complex multi-step reasoning
- Not suitable for factual lookups — may hallucinate specific facts or numbers

## Training Framework

Built as a learning project while studying the [HuggingFace LLM Course](https://huggingface.co/learn/llm-course) — specifically Chapter 11 (Fine-tuning LLMs).

**Libraries used:**
- 🤗 Transformers
- 🤗 PEFT
- TRL (SFTTrainer)
- 🤗 Datasets
- Accelerate

## Author

Made by [puravky](https://huggingface.co/puravky) — undergrad student exploring ML and AI.