Files
chatOP/README.md

101 lines
2.9 KiB
Markdown
Raw Permalink Normal View History

---
language:
- en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
tags:
- llm
- fine-tuned
- lora
- sft
- text-generation
- student-project
datasets:
- HuggingFaceTB/smoltalk
pipeline_tag: text-generation
---
# chatOP — SmolLM2-135M Fine-tuned
A fine-tuned version of [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) trained to act as a friendly study assistant for CS and ML concepts.
## Model Details
| | |
|---|---|
| **Base model** | HuggingFaceTB/SmolLM2-135M-Instruct |
| **Model type** | Causal Language Model |
| **Fine-tuning method** | SFT + LoRA |
| **Language** | English |
| **License** | Apache 2.0 |
## Training Details
### Dataset
- **Name:** [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
- **Split used:** `all`
- **Format:** Conversational chat messages (system, user, assistant)
### Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Training steps | 200 |
| Batch size | 2 |
| Packing | True |
| Optimizer | AdamW |
| LR scheduler | Linear |
| Precision | fp16 |
### LoRA Config
| Parameter | Value |
|---|---|
| Rank (`r`) | 16 |
| Alpha (`lora_alpha`) | 16 |
| Dropout | 0.05 |
| Target modules | all-linear |
| Bias | none |
| Trainable parameters | ~1.7M / 135M (1.26%) |
### Hardware
- **GPU:** NVIDIA Tesla T4 (Google Colab)
- **Training time:** ~1520 minutes
## Example Outputs
**Prompt:** What is machine learning?
> Machine learning is a field of AI where systems learn from data to make predictions or decisions without being explicitly programmed for each task.
**Prompt:** Explain transformers in simple terms.
> A Transformer is a neural network architecture that uses attention mechanisms to understand relationships between all words in a sentence simultaneously, rather than reading them one by one like older models did.
**Prompt:** What is the difference between LoRA and full fine-tuning?
> Full fine-tuning updates all parameters of a model which requires a lot of memory. LoRA instead freezes the base model and adds small trainable matrices to each layer, training only ~1% of parameters while achieving similar results.
## Limitations
- Small model (135M parameters) — responses may be less detailed than larger models
- Trained for only 200 steps — longer training would improve quality
- Best for short explanations — may struggle with very complex multi-step reasoning
- Not suitable for factual lookups — may hallucinate specific facts or numbers
## Training Framework
Built as a learning project while studying the [HuggingFace LLM Course](https://huggingface.co/learn/llm-course) — specifically Chapter 11 (Fine-tuning LLMs).
**Libraries used:**
- 🤗 Transformers
- 🤗 PEFT
- TRL (SFTTrainer)
- 🤗 Datasets
- Accelerate
## Author
Made by [puravky](https://huggingface.co/puravky) — undergrad student exploring ML and AI.