MinCoder-4B-Expert/README.md

---
base_model:
- beyoru/EvolLLM
tags:
- text-generation-inference
- transformers
- qwen3
- code
- tool
- agent
- evolution
- merge
- RL
- grpo
license: apache-2.0
language:
- en
---

This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode.

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/s4drmYGEYWZyt2ZUkxIpI.png" width="300">
</p>


Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.
初始化项目，由ModelHub XC社区提供模型 Model: beyoru/MinCoder-4B-Expert Source: Original Platform 2026-05-06 15:11:18 +08:00			`---`
			`base_model:`
			`- beyoru/EvolLLM`
			`tags:`
			`- text-generation-inference`
			`- transformers`
			`- qwen3`
			`- code`
			`- tool`
			`- agent`
			`- evolution`
			`- merge`
			`- RL`
			`- grpo`
			`license: apache-2.0`
			`language:`
			`- en`
			`---`

			`This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode.`

			`<p align="center">`
			`<img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/s4drmYGEYWZyt2ZUkxIpI.png" width="300">`
			`</p>`


			`Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.`