Files
chatOP/README.md
ModelHub XC 171910c623 初始化项目,由ModelHub XC社区提供模型
Model: puravky/chatOP
Source: Original Platform
2026-06-08 14:30:37 +08:00

101 lines
2.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
tags:
- llm
- fine-tuned
- lora
- sft
- text-generation
- student-project
datasets:
- HuggingFaceTB/smoltalk
pipeline_tag: text-generation
---
# chatOP — SmolLM2-135M Fine-tuned
A fine-tuned version of [SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) trained to act as a friendly study assistant for CS and ML concepts.
## Model Details
| | |
|---|---|
| **Base model** | HuggingFaceTB/SmolLM2-135M-Instruct |
| **Model type** | Causal Language Model |
| **Fine-tuning method** | SFT + LoRA |
| **Language** | English |
| **License** | Apache 2.0 |
## Training Details
### Dataset
- **Name:** [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
- **Split used:** `all`
- **Format:** Conversational chat messages (system, user, assistant)
### Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 |
| Training steps | 200 |
| Batch size | 2 |
| Packing | True |
| Optimizer | AdamW |
| LR scheduler | Linear |
| Precision | fp16 |
### LoRA Config
| Parameter | Value |
|---|---|
| Rank (`r`) | 16 |
| Alpha (`lora_alpha`) | 16 |
| Dropout | 0.05 |
| Target modules | all-linear |
| Bias | none |
| Trainable parameters | ~1.7M / 135M (1.26%) |
### Hardware
- **GPU:** NVIDIA Tesla T4 (Google Colab)
- **Training time:** ~1520 minutes
## Example Outputs
**Prompt:** What is machine learning?
> Machine learning is a field of AI where systems learn from data to make predictions or decisions without being explicitly programmed for each task.
**Prompt:** Explain transformers in simple terms.
> A Transformer is a neural network architecture that uses attention mechanisms to understand relationships between all words in a sentence simultaneously, rather than reading them one by one like older models did.
**Prompt:** What is the difference between LoRA and full fine-tuning?
> Full fine-tuning updates all parameters of a model which requires a lot of memory. LoRA instead freezes the base model and adds small trainable matrices to each layer, training only ~1% of parameters while achieving similar results.
## Limitations
- Small model (135M parameters) — responses may be less detailed than larger models
- Trained for only 200 steps — longer training would improve quality
- Best for short explanations — may struggle with very complex multi-step reasoning
- Not suitable for factual lookups — may hallucinate specific facts or numbers
## Training Framework
Built as a learning project while studying the [HuggingFace LLM Course](https://huggingface.co/learn/llm-course) — specifically Chapter 11 (Fine-tuning LLMs).
**Libraries used:**
- 🤗 Transformers
- 🤗 PEFT
- TRL (SFTTrainer)
- 🤗 Datasets
- Accelerate
## Author
Made by [puravky](https://huggingface.co/puravky) — undergrad student exploring ML and AI.