Qwen3-4B-PlumEsper/README.md

---
base_model:
- ValiantLabs/Qwen3-4B-ShiningValiant3
- ValiantLabs/Qwen3-4B-Esper3
- Qwen/Qwen3-4B
library_name: transformers
tags:
- mergekit
- merge
- qwen
- qwen-3
- qwen-3-4b
- 4b
- reasoning
- code
- code-reasoning
- code-instruct
- python
- javascript
- dev-ops
- jenkins
- terraform
- scripting
- powershell
- azure
- aws
- gcp
- cloud
- science
- science-reasoning
- physics
- biology
- chemistry
- earth-science
- astronomy
- machine-learning
- artificial-intelligence
- compsci
- computer-science
- information-theory
- ML-Ops
- math
- cuda
- deep-learning
- transformers
- agentic
- LLM
- neuromorphic
- self-improvement
- complex-systems
- cognition
- linguistics
- philosophy
- logic
- epistemology
- simulation
- game-theory
- knowledge-management
- creativity
- problem-solving
- architect
- engineer
- developer
- creative
- analytical
- expert
- rationality
- conversational
- chat
- instruct
datasets:
- sequelbox/Celestia3-DeepSeek-R1-0528
- sequelbox/Mitakihara-DeepSeek-R1-0528
- sequelbox/Titanium2.1-DeepSeek-R1
- sequelbox/Tachibana2-DeepSeek-R1
- sequelbox/Raiden-DeepSeek-R1

---
# PlumEsper

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit), combining the specialty and general reasoning skills of Esper 3 4b and Shining Valiant 3 4b.

## Merge Details
### Merge Method

This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) as a base.

### Models Merged

The following models were included in the merge:
* [ValiantLabs/Qwen3-4B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-4B-ShiningValiant3)
* [ValiantLabs/Qwen3-4B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
merge_method: della
dtype: bfloat16
parameters:
  normalize: true
models:
  - model: ValiantLabs/Qwen3-4B-Esper3
    parameters:
      density: 0.5
      weight: 0.3
  - model: ValiantLabs/Qwen3-4B-ShiningValiant3
    parameters:
      density: 0.5
      weight: 0.3
base_model: Qwen/Qwen3-4B

```
初始化项目，由ModelHub XC社区提供模型 Model: sequelbox/Qwen3-4B-PlumEsper Source: Original Platform 2026-06-03 10:58:13 +08:00			`---`
			`base_model:`
			`- ValiantLabs/Qwen3-4B-ShiningValiant3`
			`- ValiantLabs/Qwen3-4B-Esper3`
			`- Qwen/Qwen3-4B`
			`library_name: transformers`
			`tags:`
			`- mergekit`
			`- merge`
			`- qwen`
			`- qwen-3`
			`- qwen-3-4b`
			`- 4b`
			`- reasoning`
			`- code`
			`- code-reasoning`
			`- code-instruct`
			`- python`
			`- javascript`
			`- dev-ops`
			`- jenkins`
			`- terraform`
			`- scripting`
			`- powershell`
			`- azure`
			`- aws`
			`- gcp`
			`- cloud`
			`- science`
			`- science-reasoning`
			`- physics`
			`- biology`
			`- chemistry`
			`- earth-science`
			`- astronomy`
			`- machine-learning`
			`- artificial-intelligence`
			`- compsci`
			`- computer-science`
			`- information-theory`
			`- ML-Ops`
			`- math`
			`- cuda`
			`- deep-learning`
			`- transformers`
			`- agentic`
			`- LLM`
			`- neuromorphic`
			`- self-improvement`
			`- complex-systems`
			`- cognition`
			`- linguistics`
			`- philosophy`
			`- logic`
			`- epistemology`
			`- simulation`
			`- game-theory`
			`- knowledge-management`
			`- creativity`
			`- problem-solving`
			`- architect`
			`- engineer`
			`- developer`
			`- creative`
			`- analytical`
			`- expert`
			`- rationality`
			`- conversational`
			`- chat`
			`- instruct`
			`datasets:`
			`- sequelbox/Celestia3-DeepSeek-R1-0528`
			`- sequelbox/Mitakihara-DeepSeek-R1-0528`
			`- sequelbox/Titanium2.1-DeepSeek-R1`
			`- sequelbox/Tachibana2-DeepSeek-R1`
			`- sequelbox/Raiden-DeepSeek-R1`

			`---`
			`# PlumEsper`

			`This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit), combining the specialty and general reasoning skills of Esper 3 4b and Shining Valiant 3 4b.`

			`## Merge Details`
			`### Merge Method`

			`This model was merged using the [DELLA](https://arxiv.org/abs/2406.11617) merge method using [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) as a base.`

			`### Models Merged`

			`The following models were included in the merge:`
			`* [ValiantLabs/Qwen3-4B-ShiningValiant3](https://huggingface.co/ValiantLabs/Qwen3-4B-ShiningValiant3)`
			`* [ValiantLabs/Qwen3-4B-Esper3](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3)`

			`### Configuration`

			`The following YAML configuration was used to produce this model:`

			```yaml
			`merge_method: della`
			`dtype: bfloat16`
			`parameters:`
			`normalize: true`
			`models:`
			`- model: ValiantLabs/Qwen3-4B-Esper3`
			`parameters:`
			`density: 0.5`
			`weight: 0.3`
			`- model: ValiantLabs/Qwen3-4B-ShiningValiant3`
			`parameters:`
			`density: 0.5`
			`weight: 0.3`
			`base_model: Qwen/Qwen3-4B`

			```