初始化项目，由ModelHub XC社区提供模型

Model: Shekswess/trlm-stage-1-sft-final-2 Source: Original Platform
2026-05-27 07:08:12 +08:00
commit 1ecb1f6c2b
13 changed files with 49311 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,116 @@
+---
+library_name: transformers
+license: apache-2.0
+base_model: HuggingFaceTB/SmolLM2-135M-Instruct
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: trlm-stage-1-sft-final-2
+  results: []
+---
+
+
+![image/png](https://github.com/user-attachments/assets/5f453496-8180-4cf4-94da-26ebbe1159d4)
+
+# 🧠 trlm-stage-1-sft-final-2
+
+`trlm-stage-1-sft-final-2` is the **Stage 1** post-training model for the **Tiny Reasoning Language Model (trlm)** project.  
+This stage focuses on **everyday conversations** and **general instruction following**, fine-tuned on a curated dataset of **58,000 entries**.
+
+---
+
+## 📖 Model Description
+
+- **Base Model**: [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)  
+- **Type**: Causal Language Model (decoder-only transformer)  
+- **Stage**: Post-training **Stage 1 (SFT)**  
+- **Objective**: Build a solid foundation in **instruction-following** and **dialogue coherence** before advancing to reasoning-specific training.  
+
+This stage teaches the model to **follow instructions, rewrite, summarize, and hold conversations** without reasoning tokens.
+
+---
+
+## 🎯 Intended Uses & Limitations
+
+### Intended Uses
+- Everyday conversation assistants  
+- Instruction-following tasks (summarization, rewriting, simple dialogue)  
+- Precursor foundation for reasoning post-training (Stage 2+)  
+
+### Limitations
+- Not optimized for reasoning (handled in later stages)  
+- May struggle with multi-step logical or mathematical problems  
+- Trained only on English datasets  
+
+---
+
+## 📊 Training Data
+
+This model was trained on the dataset:  
+👉 [**Shekswess/trlm-sft-stage-1-final**](https://huggingface.co/datasets/Shekswess/trlm-sft-stage-1-final)
+
+**Dataset summary**:
+- **Entries**: 58,000  
+- **Sources**: 7 HuggingFaceTB/smoltalk2 subsets  
+- **Focus**: Non-reasoning conversations and instruction-following  
+
+| Source Dataset | Entries | Percentage % |
+|----------------|---------|---|
+| smoltalk_smollm3_smol_magpie_ultra_no_think | 33,500 | 57.8% |
+| smoltalk_smollm3_smol_summarize_no_think | 7,500 | 12.9% |
+| smoltalk_smollm3_smol_rewrite_no_think | 7,500 | 12.9% |
+| smoltalk_smollm3_systemchats_30k_no_think | 2,500 | 4.3% |
+| smoltalk_smollm3_explore_instruct_rewriting_no_think | 2,500 | 4.3% |
+| tulu_3_sft_personas_instruction_following_no_think | 2,500 | 4.3% |
+| smoltalk_smollm3_everyday_conversations_no_think | 2,000 | 3.4% |
+
+---
+
+## ⚙️ Training Procedure
+
+### Training Hyperparameters
+- **Learning rate**: 3e-4  
+- **Train batch size**: 32  
+- **Eval batch size**: 8  
+- **Gradient accumulation steps**: 4  
+- **Total effective batch size**: 128  
+- **Optimizer**: AdamW (betas=(0.9, 0.99), eps=1e-08)  
+- **LR Scheduler**: Cosine with warmup ratio 0.1  
+- **Epochs**: 2  
+- **Seed**: 42  
+
+### Framework Versions
+- **Transformers**: 4.56.2  
+- **PyTorch**: 2.7.1+rocm7.0.0.git698b58a9  
+- **Datasets**: 4.0.0  
+- **Tokenizers**: 0.22.1  
+
+---
+
+## 🚀 Usage
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_name = "Shekswess/trlm-stage-1-sft-final-2"
+
+# Load tokenizer & model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+
+# Example inference
+inputs = tokenizer("Write a short daily affirmation:", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## 📌 Next Steps
+
+- **Stage 2**: Supervised fine-tuning with reasoning-focused data
+- **Stage 3**: DPO / preference optimization for reasoning stability
+
+---
+
+Part of the Tiny Reasoning Language Model (trlm) post-training pipeline.