初始化项目,由ModelHub XC社区提供模型
Model: OpenPipe/Deductive-Reasoning-Qwen-14B Source: Original Platform
This commit is contained in:
39
README.md
Normal file
39
README.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
license: mit
|
||||
license_link: https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-14B/blob/main/LICENSE
|
||||
language:
|
||||
- zho
|
||||
- eng
|
||||
- fra
|
||||
- spa
|
||||
- por
|
||||
- deu
|
||||
- ita
|
||||
- rus
|
||||
- jpn
|
||||
- kor
|
||||
- vie
|
||||
- tha
|
||||
- ara
|
||||
pipeline_tag: text-generation
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-14B-Instruct
|
||||
tags:
|
||||
- chat
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# Deductive-Reasoning-Qwen-14B
|
||||
|
||||

|
||||
|
||||
Deductive Reasoning Qwen 14B is a reinforcement fine-tune of [Qwen 2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) to solve challenging deduction problems from the [Temporal Clue](https://github.com/bradhilton/temporal-clue) dataset, trained by [OpenPipe](https://openpipe.ai)!
|
||||
|
||||
Here are some additional resources to check out:
|
||||
|
||||
- [Blog Post](https://openpipe.ai/blog/using-grpo-to-beat-o1-o3-mini-and-r1-on-temporal-clue)
|
||||
- [Training Recipe](https://github.com/openpipe/deductive-reasoning)
|
||||
- [RL Experiments](https://github.com/openpipe/rl-experiments)
|
||||
- [Deductive Reasoning Qwen 32B](https://huggingface.co/OpenPipe/Deductive-Reasoning-Qwen-32B)
|
||||
|
||||
If you're interested in training your own models with reinforcement learning or just chatting, feel free to [reach out](https://openpipe.ai/contact) or email Kyle directly at kyle@openpipe.ai!
|
||||
Reference in New Issue
Block a user