27 lines
758 B
Markdown
27 lines
758 B
Markdown
|
|
---
|
||
|
|
base_model:
|
||
|
|
- beyoru/EvolLLM
|
||
|
|
tags:
|
||
|
|
- text-generation-inference
|
||
|
|
- transformers
|
||
|
|
- qwen3
|
||
|
|
- code
|
||
|
|
- tool
|
||
|
|
- agent
|
||
|
|
- evolution
|
||
|
|
- merge
|
||
|
|
- RL
|
||
|
|
- grpo
|
||
|
|
license: apache-2.0
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
---
|
||
|
|
|
||
|
|
This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode.
|
||
|
|
|
||
|
|
<p align="center">
|
||
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/s4drmYGEYWZyt2ZUkxIpI.png" width="300">
|
||
|
|
</p>
|
||
|
|
|
||
|
|
|
||
|
|
Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.
|