101 lines
2.9 KiB
Markdown
101 lines
2.9 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
|
|||
|
|
tags:
|
|||
|
|
- llm
|
|||
|
|
- fine-tuned
|
|||
|
|
- lora
|
|||
|
|
- sft
|
|||
|
|
- text-generation
|
|||
|
|
- student-project
|
|||
|
|
datasets:
|
|||
|
|
- HuggingFaceTB/smoltalk
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# chatOP — SmolLM2-135M Fine-tuned
|
|||
|
|
|
|||
|
|
A fine-tuned version of [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) trained to act as a friendly study assistant for CS and ML concepts.
|
|||
|
|
|
|||
|
|
## Model Details
|
|||
|
|
|
|||
|
|
| | |
|
|||
|
|
|---|---|
|
|||
|
|
| **Base model** | HuggingFaceTB/SmolLM2-135M-Instruct |
|
|||
|
|
| **Model type** | Causal Language Model |
|
|||
|
|
| **Fine-tuning method** | SFT + LoRA |
|
|||
|
|
| **Language** | English |
|
|||
|
|
| **License** | Apache 2.0 |
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
### Dataset
|
|||
|
|
- **Name:** [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
|
|||
|
|
- **Split used:** `all`
|
|||
|
|
- **Format:** Conversational chat messages (system, user, assistant)
|
|||
|
|
|
|||
|
|
### Hyperparameters
|
|||
|
|
|
|||
|
|
| Parameter | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Learning rate | 2e-4 |
|
|||
|
|
| Training steps | 200 |
|
|||
|
|
| Batch size | 2 |
|
|||
|
|
| Packing | True |
|
|||
|
|
| Optimizer | AdamW |
|
|||
|
|
| LR scheduler | Linear |
|
|||
|
|
| Precision | fp16 |
|
|||
|
|
|
|||
|
|
### LoRA Config
|
|||
|
|
|
|||
|
|
| Parameter | Value |
|
|||
|
|
|---|---|
|
|||
|
|
| Rank (`r`) | 16 |
|
|||
|
|
| Alpha (`lora_alpha`) | 16 |
|
|||
|
|
| Dropout | 0.05 |
|
|||
|
|
| Target modules | all-linear |
|
|||
|
|
| Bias | none |
|
|||
|
|
| Trainable parameters | ~1.7M / 135M (1.26%) |
|
|||
|
|
|
|||
|
|
### Hardware
|
|||
|
|
- **GPU:** NVIDIA Tesla T4 (Google Colab)
|
|||
|
|
- **Training time:** ~15–20 minutes
|
|||
|
|
|
|||
|
|
## Example Outputs
|
|||
|
|
|
|||
|
|
**Prompt:** What is machine learning?
|
|||
|
|
|
|||
|
|
> Machine learning is a field of AI where systems learn from data to make predictions or decisions without being explicitly programmed for each task.
|
|||
|
|
|
|||
|
|
**Prompt:** Explain transformers in simple terms.
|
|||
|
|
|
|||
|
|
> A Transformer is a neural network architecture that uses attention mechanisms to understand relationships between all words in a sentence simultaneously, rather than reading them one by one like older models did.
|
|||
|
|
|
|||
|
|
**Prompt:** What is the difference between LoRA and full fine-tuning?
|
|||
|
|
|
|||
|
|
> Full fine-tuning updates all parameters of a model which requires a lot of memory. LoRA instead freezes the base model and adds small trainable matrices to each layer, training only ~1% of parameters while achieving similar results.
|
|||
|
|
|
|||
|
|
## Limitations
|
|||
|
|
|
|||
|
|
- Small model (135M parameters) — responses may be less detailed than larger models
|
|||
|
|
- Trained for only 200 steps — longer training would improve quality
|
|||
|
|
- Best for short explanations — may struggle with very complex multi-step reasoning
|
|||
|
|
- Not suitable for factual lookups — may hallucinate specific facts or numbers
|
|||
|
|
|
|||
|
|
## Training Framework
|
|||
|
|
|
|||
|
|
Built as a learning project while studying the [HuggingFace LLM Course](https://huggingface.co/learn/llm-course) — specifically Chapter 11 (Fine-tuning LLMs).
|
|||
|
|
|
|||
|
|
**Libraries used:**
|
|||
|
|
- 🤗 Transformers
|
|||
|
|
- 🤗 PEFT
|
|||
|
|
- TRL (SFTTrainer)
|
|||
|
|
- 🤗 Datasets
|
|||
|
|
- Accelerate
|
|||
|
|
|
|||
|
|
## Author
|
|||
|
|
|
|||
|
|
Made by [puravky](https://huggingface.co/puravky) — undergrad student exploring ML and AI.
|