32 lines
2.5 KiB
Markdown
32 lines
2.5 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags:
|
|
- trl
|
|
- grpo
|
|
---
|
|
|
|
# SmolGRPO-135M
|
|
|
|
This is a fine-tune of [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) using GRPO on [mlabonne/smoltldr](https://huggingface.co/datasets/mlabonne/smoltldr) (2k samples).
|
|
It is designed to summarize Reddit posts using ~50 characters.
|
|
|
|
You can reproduce this training using this [colab notebook](https://colab.research.google.com/drive/13mRqgRIvMGGgkQfJL4CS0lzcL4Vl9xUN?usp=sharing). It takes about 40 minutes to train the model.
|
|
|
|
**Takeaways from these experiments**:
|
|
* Adding a system prompt like "Summarize the following text concisely" doesn't help
|
|
* You can get faster convergence with a higher learning rate and fewer samples, but it's prone to overshooting your target.
|
|
* I tried many reward functions to play with reward shaping but it didn't seem to help
|
|
|
|
## Example
|
|
|
|
Input:
|
|
> SUBREDDIT: r/Advice
|
|
>
|
|
> TITLE: I have big dreams and goals, but they are kind of cloudy.[m20]
|
|
>
|
|
> POST: I live at home with my family right now and I don't go to school. I went to college for a year and decided to stop. My best friend convinced me that I don't need college to do what I want to do. Besides, I hated taking classes I wasn't interested in. The things I want to do in life (I know it seems like too much) included producing music, making a cartoon, making comics, designing clothes and shoes, and other smaller things related to that. I grew up with a good family with a father that had similar dreams. He niw has a job he's been working for 20+ years that he doesn't like. I too am afraid of falling into that path. I've been pretty down and frustrated and feeling things are quite impossible although I know there's always hope. I don't really want to do anything else, but I'm stuck. I'll be pretty down for a week, then by the next week I'm in high spirits with a game plan that always fails. I've been doing this for quite a while now. My parents are starting to get on my case now and when they ask what I'm gonna do in life I don't know how to respond. Maybe I should try looking for lessons in nyc to get me out of the house? I practice drawing and making music a lot, but I can never feel satisfied and feel like I'm moving in the right direction. Everything seems like a scary cycle of ups and downs. I have faith I can turn it around, but I just don't know how.
|
|
>
|
|
> TL;DR:
|
|
|
|
Output:
|
|
> I have big dreams and goals, but they are kind of cloudy. I'm going to try to figure out a plan to get out of that house, but I'm not sure what that plan will be. |