Files
SmolLM2-360M-Grpo-r999/README.md
ModelHub XC 872bb83c15 初始化项目,由ModelHub XC社区提供模型
Model: prithivMLmods/SmolLM2-360M-Grpo-r999
Source: Original Platform
2026-05-26 09:36:14 +08:00

76 lines
4.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
- zh
base_model:
- HuggingFaceTB/SmolLM2-360M-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
- Grpo
- text-generation-inference
- Llama
- trl
---
![d9-mAgyravvwWXZGi3sK5.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/jTUNV5nFY_tyhYQM-zeXl.png)
# **SmolLM2-360M-Grpo-r999**
SmolLM2-360M-Grpo-r999 is fine-tuned based on **SmolLM2-360M-Instruct**. SmolLM2 demonstrates significant advances over its predecessor, SmolLM1, particularly in instruction following, knowledge, and reasoning. The **360M** model was trained on **2 trillion tokens** using a diverse combination of datasets: **FineWeb-Edu, DCLM, The Stack**, along with new filtered datasets that we curated and will release soon. We developed the instruct version through **supervised fine-tuning (SFT)** using a combination of public datasets and our own curated datasets.
### **How to Use**
### Transformers
```bash
pip install transformers
```
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "prithivMLmods/SmolLM2-360M-Grpo-r999"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
messages = [{"role": "user", "content": "What is gravity?"}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))
```
### **Limitations of SmolLM2-360M-Grpo-r999**
1. **Model Size**: While **360M parameters** provide enhanced capabilities, the model still has limitations in handling highly complex reasoning tasks or long-context dependencies compared to larger models.
2. **Bias and Inaccuracy**: Despite fine-tuning on diverse datasets, the model may generate biased, inaccurate, or factually incorrect responses, particularly for niche topics or specialized knowledge areas.
3. **Context Length**: The model might struggle with very long conversations or extended prompts, potentially leading to truncation or loss of contextual coherence.
4. **Fine-Tuning Specificity**: Performance on specialized domains may require additional fine-tuning with domain-specific datasets.
5. **Generalization**: The model may not generalize as effectively to **rare queries** or **unseen tasks** compared to larger models, sometimes providing generic or incomplete answers.
6. **Limited Multi-Turn Conversations**: While it supports multi-turn interactions, its ability to retain and use context over extended conversations is **not as strong as larger models**.
### **Intended Use of SmolLM2-360M-Grpo-r999**
1. **General-purpose Conversational AI** Ideal for chatbots, virtual assistants, and interactive applications requiring basic reasoning and knowledge retrieval.
2. **Education & Tutoring** Supports answering educational queries, explaining concepts, and aiding learning across multiple domains.
3. **Content Generation** Can generate short-form text, summaries, and brainstorming ideas for writing assistants or creativity tools.
4. **Code Assistance** Fine-tuned on programming datasets, making it useful for debugging, explaining code, and assisting developers.
5. **Instruction Following** Optimized for following structured commands, making it suitable for task-based applications.
6. **Prototyping & Experimentation** Lightweight model for **fast deployment** in new AI applications, balancing performance with efficiency.
7. **Low-Resource Environments** Runs on **edge devices, mobile apps, and local servers** where larger models are infeasible.
8. **Research & Development** Can be used as a base model for **further fine-tuning** or model optimizations.