初始化项目，由ModelHub XC社区提供模型

Model: suayptalha/ThinkerLlama-8B-v1 Source: Original Platform
2026-05-22 04:18:12 +08:00
commit f76c04ceff
13 changed files with 2550 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,50 @@
+---
+license: apache-2.0
+tags:
+- unsloth
+- trl
+- grpo
+- llama
+language:
+- en
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+datasets:
+- microsoft/orca-math-word-problems-200k
+---
+
+# Model Details
+
+Reasoning Llama model series fine-tuned on microsoft/orca-math-word-problems-200k using GRPO(Group Relative Policy Optimization) reinforcement learning technique.
+
+Base model: meta-llama/Llama-3.1-8B-Instruct
+
+# Parameters
+
+- learning_rate = 5e-6,
+- adam_beta1 = 0.9,
+- adam_beta2 = 0.99,
+- weight_decay = 0.1,
+- warmup_ratio = 0.1,
+- lr_scheduler_type = "cosine",
+- optim = "paged_adamw_8bit",
+
+# Suggested system prompt for reasoning
+
+```
+Respond in the following format:
+<reasoning>
+...
+</reasoning>
+<answer>
+...
+</answer>
+Do not forget <reasoning></reasoning><answer></answer> tags.
+```
+
+# Support:
+
+If you find this work useful, you can support me!
+<a href="https://www.buymeacoffee.com/suayptalha" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>