Go to file

ModelHub XC 29a9285291 初始化项目，由ModelHub XC社区提供模型

Model: Jagan666/7B-merge-champion
Source: Original Platform

2026-05-02 01:48:21 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

mergekit_config.yml

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 01:48:21 +08:00

README.md

tags, license, pipeline_tag, base_model

7B Linear Merge (Qwen2.5)

A linear merge of four Qwen2.5-7B fine-tunes, with mixing weights chosen by random search over the simplex (30 Dirichlet samples) and selected against a small held-out eval set.

This was a learning project to build an end-to-end merge + evaluation pipeline. The numbers below are honest results — the merge is competent but not state-of-the-art for 7B Qwen2.5 fine-tunes.

Source models

Xiaojian9992024/Qwen2.5-Dyanka-7B-Preview
Xiaojian9992024/Qwen2.5-THREADRIPPER-Small
suayptalha/Clarus-7B-v0.3
gz987/qwen2.5-7b-cabs-v0.3

Method

Linear merge via mergekit. Mixing weights were selected by sampling 30 weight vectors from a Dirichlet prior, evaluating each merged candidate on a 20-example proxy eval (mixed MMLU + IFEval-style instruction following), and keeping the best-scoring weights. The proxy eval was small and the search procedure was random sampling rather than a true evolutionary algorithm — limitations worth noting for anyone building on this.

Evaluation

Evaluated with lm-evaluation-harness on the Open LLM Leaderboard v2 task suite, single H100, vLLM backend, bf16.

Benchmark	Metric	Score
IFEval	prompt_level_strict_acc	38.63
IFEval	inst_level_strict_acc	52.76
BBH	acc_norm	55.55
MATH-Lvl-5 (hard)	exact_match	36.93
GPQA	acc_norm	32.30
MuSR	acc_norm	44.58
MMLU-Pro	acc	44.92

Observations

Strong: MATH-Hard (36.9, with algebra-hard at 63.5%) — likely inherited from Clarus and qwen2.5-7b-cabs.
Weak: IFEval at 38.6 prompt-level strict is below what individual strong Qwen2.5-7B fine-tunes achieve. Linear merging appears to dilute instruction-following behavior when the source models disagree on response formatting.
Average: BBH, MMLU-Pro, GPQA, MuSR all land in the typical mid-range for 7B models.

Reproduce

lm_eval \
  --model vllm \
  --model_args pretrained=Jagan666/7B-merge-champion,dtype=bfloat16,gpu_memory_utilization=0.9,max_model_len=4096 \
  --tasks leaderboard \
  --batch_size auto \
  --output_path ./eval_results \
  --log_samples

Limitations

Linear merge: simple, but can dilute task-specific behaviors (especially instruction following).
Search was random Dirichlet sampling on a small proxy eval — likely overfits to the proxy.
No safety / alignment evaluation was performed beyond the leaderboard tasks.

License

Apache 2.0, inherited from the source models.