ModelHub XC 913020eb1c 初始化项目,由ModelHub XC社区提供模型
Model: Lambent/Qwen3-4B-Base-Continued-GRPO-Merge
Source: Original Platform
2026-06-02 03:31:25 +08:00

base_model, library_name, tags, license
base_model library_name tags license
Lambent/Qwen3-4B-Base-Continued-GRPO-B
Lambent/Qwen3-4B-Base-Continued-GRPO
transformers
mergekit
merge
apache-2.0

Credit to virtuous7373 for posting the CABS implementation used here.

CABS sparsified version of the original GRPO training Lambent/Qwen3-4B-Base-Continued-GRPO, merged with TIES to the Lambent/Qwen3-4B-Base-Continued-GRPO-B model.

Task Metric Base Trained Delta
arc_easy acc 0.7891 0.7870 -0.27%
arc_easy acc_norm 0.7609 0.7605 -0.05%
lambada_openai acc 0.6912 0.6984 +1.04%
lambada_openai perplexity 4.2433 4.0490 -4.6% ↓
openbookqa acc 0.3160 0.3180 +0.63%
openbookqa acc_norm 0.4100 0.4120 +0.49%
piqa acc 0.7797 0.7807 +0.13%
piqa acc_norm 0.7807 0.7807 +0.00%

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the TIES merge method using ./merged_models/llm-judge-merged-fixed as a base.

Models Merged

The following models were included in the merge:

  • ./merged_models/grpo-cabs

Configuration

The following YAML configuration was used to produce this model:

# TIES merge: Judge as base, inject sparse GRPO knowledge
merge_method: ties
base_model: ./merged_models/llm-judge-merged-fixed
models:
  - model: ./merged_models/grpo-cabs
    parameters:
      density: 0.5
      weight: 0.4
dtype: bfloat16

Description
Model synced from source: Lambent/Qwen3-4B-Base-Continued-GRPO-Merge
Readme 2 MiB
Languages
Jinja 100%