--- base_model: - mergekit-community/Qwen2.5-7B-della - mergekit-community/Qwen2.5-7B-ties - Qwen/Qwen2.5-7B-Instruct - Qwen/Qwen2.5-7B-Instruct-1M - mergekit-community/Qwen2.5-7B-ties-1M - Qwen/Qwen2.5-7B - mergekit-community/Qwen2.5-7B-della-1M library_name: transformers tags: - mergekit - merge license: apache-2.0 language: - en - zh pipeline_tag: text-generation --- # Achieve the Optimal Merged Model by Using One Basic Model and Two Fine-tuned Models! *What is the best way to merge **one base model** and **two fine-tuned models**?* ## This might be the best answer at the present stage! [Qwen2.5-7B-YOYO-super](https://huggingface.co/YOYO-AI/Qwen2.5-7B-YOYO-super) [Qwen2.5-14B-YOYO-super](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-super) *This is not a whim release, but the optimal result of countless merging experiments!* *Here is the formula for the **previous generation**:* ```yaml models: - model: Qwen/Qwen2.5-7B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen/Qwen2.5-7B-Instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-7B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base ``` *It was widely used in the merging process of the **previous generation of models***. *However, there are some **deficiencies***: *1.There is relatively little retention of knowledge of the basic model.* *2.The mathematical and coding abilities have declined.* ## And here is the formula for this generation: ```yaml models: - model: Qwen/Qwen2.5-7B-instruct parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-7B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: float16 tokenizer_source: base name: Qwen2.5-7B-della ``` ```yaml models: - model: Qwen/Qwen2.5-7B-instruct-1M parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen2.5-7B parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: float16 tokenizer_source: base name: Qwen2.5-7B-della-1M ``` ```yaml models: - model: Qwen/Qwen2.5-7B-instruct parameters: density: 1 weight: 1 merge_method: ties base_model: Qwen/Qwen2.5-7B parameters: density: 1 weight: 1 normalize: true int8_mask: true dtype: float16 tokenizer_source: base name: Qwen2.5-7B-ties ``` ```yaml models: - model: Qwen/Qwen2.5-7B-instruct-1M parameters: density: 1 weight: 1 merge_method: ties base_model: Qwen/Qwen2.5-7B parameters: density: 1 weight: 1 normalize: true int8_mask: true dtype: float16 tokenizer_source: base name: Qwen2.5-7B-ties-1M ``` ```yaml merge_method: model_stock base_model: Qwen/Qwen2.5-7B models: - model: mergekit-community/Qwen2.5-7B-della - model: mergekit-community/Qwen2.5-7B-della-1M - model: mergekit-community/Qwen2.5-7B-ties - model: mergekit-community/Qwen2.5-7B-ties-1M - model: Qwen/Qwen2.5-7B-instruct-1M - model: Qwen/Qwen2.5-7B-instruct tokenizer_source: base int8_mask: true normalize: true dtype: float16 ``` *Except for a slight decrease in instruction following, **significant improvements** have been achieved in all other aspects.* *This formula will also be used in the development of **the next generation of YOYO models**.* ***YOYO-AI** not only releases merged models with excellent performance but also publishes a **complete and high-quality model merging formula**, hoping to promote the progress of model merging technology in the open-source community with this!* ### If you can use this formula when merging models, it will be the greatest support for YOYO-AI!