初始化项目,由ModelHub XC社区提供模型

Model: harshitv804/MetaMath-Mistral-2x7B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-05 21:13:07 +08:00
commit a7d4e0e54c
18 changed files with 91448 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

126
README.md Normal file
View File

@@ -0,0 +1,126 @@
---
base_model:
- meta-math/MetaMath-Mistral-7B
tags:
- mergekit
- merge
- meta-math/MetaMath-Mistral-7B
- Mixture of Experts
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63060761cb5492c9859b64ea/BfR-Giwmh_3R-ymdeiI5k.png)
This is MetaMath-Mistral-2x7B Mixture of Experts (MOE) model created using [mergekit](https://github.com/cg123/mergekit) for experimental and learning purpose of MOE.
## Merge Details
### Merge Method
This model was merged using the SLERP merge method using [meta-math/MetaMath-Mistral-7B](https://huggingface.co/meta-math/MetaMath-Mistral-7B) as the base model.
### Models Merged
The following models were included in the merge:
* [meta-math/MetaMath-Mistral-7B](https://huggingface.co/meta-math/MetaMath-Mistral-7B) x 2
### Configuration
The following YAML configuration was used to produce this model:
```yaml
slices:
- sources:
- model: meta-math/MetaMath-Mistral-7B
layer_range: [0, 32]
- model: meta-math/MetaMath-Mistral-7B
layer_range: [0, 32]
merge_method: slerp
base_model: meta-math/MetaMath-Mistral-7B
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
```
## Inference Code
```python
## install dependencies
## !pip install -q -U git+https://github.com/huggingface/transformers.git
## !pip install -q -U git+https://github.com/huggingface/accelerate.git
## !pip install -q -U sentencepiece
## load model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_name = "harshitv804/MetaMath-Mistral-2x7B"
# load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token
## inference
query = "Maximoff's monthly bill is $60 per month. His monthly bill increased by thirty percent when he started working at home. How much is his total monthly bill working from home?"
prompt =f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.\n
### Instruction:\n
{query}\n
### Response: Let's think step by step.
"""
# tokenize the input string
inputs = tokenizer(
prompt,
return_tensors="pt",
return_attention_mask=False
)
# generate text using the model
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, max_length=2048, streamer=streamer)
# decode and print the output
text = tokenizer.batch_decode(outputs)[0]
```
## Citation
```bibtex
@article{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
journal={arXiv preprint arXiv:2309.12284},
year={2023}
}
```
```bibtex
@article{jiang2023mistral,
title={Mistral 7B},
author={Jiang, Albert Q and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and others},
journal={arXiv preprint arXiv:2310.06825},
year={2023}
}
```

3
added_tokens.json Normal file
View File

@@ -0,0 +1,3 @@
{
"[PAD]": 32000
}

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "meta-math/MetaMath-Mistral-7B",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.38.2",
"use_cache": false,
"vocab_size": 32001
}

17
mergekit_config.yml Normal file
View File

@@ -0,0 +1,17 @@
slices:
- sources:
- model: meta-math/MetaMath-Mistral-7B
layer_range: [0, 32]
- model: meta-math/MetaMath-Mistral-7B
layer_range: [0, 32]
merge_method: slerp
base_model: meta-math/MetaMath-Mistral-7B
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8dd31eaa452113ea2b40cdfcddc6a5e860c560d27d5c1fbba1521bb3838b8fa3
size 1912681584

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b03a6a36b40063377a1b569a276f765754ed92e237d97976da2f34f937902db9
size 1979781456

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0d5ae07f0adad7dfeee8f8bcd09089736b0046ce800ab8b77cdc677e4a42ee7d
size 1946243968

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:29f6059614ca3484b604b2c1c3d4d052adf8bccd727aa8dffb554b2d61cd8f95
size 1979781416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e356cd755e7805d318b2391f41dfab8c274f6b49473710c8468bd280555d9f3e
size 1862349080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63e061b6c8f5a1506021a69153617c2629dfdcc875825ec671382e4f0e498909
size 1916882896

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:df28bd63dec7d46fcaa1fe23451f3cc59c9dc2c85a7409c5070eeb0bfb51aa88
size 1979781456

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c8bc1a24d1e1a409d63e760560ff960c43e892ccaa40626efadf3d9fc6bf9241
size 906012472

File diff suppressed because one or more lines are too long

30
special_tokens_map.json Normal file
View File

@@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "[PAD]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

91131
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
tokenizer.model (Stored with Git LFS) Normal file

Binary file not shown.

52
tokenizer_config.json Normal file
View File

@@ -0,0 +1,52 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": true,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"32000": {
"content": "[PAD]",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1024,
"pad_token": "[PAD]",
"padding_side": "right",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}