初始化项目,由ModelHub XC社区提供模型
Model: MediaTek-Research/Breeze-7B-32k-Base-v1_0 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
121
README.md
Normal file
121
README.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
pipeline_tag: text-generation
|
||||
license: apache-2.0
|
||||
language:
|
||||
- zh
|
||||
- en
|
||||
---
|
||||
|
||||
# Model Card for MediaTek Research Breeze-7B-32k-Base-v1_0
|
||||
|
||||
MediaTek Research Breeze-7B (hereinafter referred to as Breeze-7B) is a language model family that builds on top of [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1), specifically intended for Traditional Chinese use.
|
||||
|
||||
[Breeze-7B-Base](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v1_0) is the base model for the Breeze-7B series.
|
||||
It is suitable for use if you have substantial fine-tuning data to tune it for your specific use case.
|
||||
|
||||
[Breeze-7B-Instruct](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v1_0) derives from the base model Breeze-7B-Base, making the resulting model amenable to be used as-is for commonly seen tasks.
|
||||
|
||||
[Breeze-7B-32k-Base](https://huggingface.co/MediaTek-Research/Breeze-7B-32k-Base-v1_0) is extended from the base model with more data, base change, and the disabling of the sliding window.
|
||||
Roughly speaking, that is equivalent to 44k Traditional Chinese characters.
|
||||
|
||||
[Breeze-7B-32k-Instruct](https://huggingface.co/MediaTek-Research/Breeze-7B-32k-Instruct-v1_0) derives from the base model Breeze-7B-32k-Base, making the resulting model amenable to be used as-is for commonly seen tasks.
|
||||
|
||||
|
||||
|
||||
Practicality-wise:
|
||||
- Breeze-7B-Base expands the original vocabulary with additional 30,000 Traditional Chinese tokens. With the expanded vocabulary, everything else being equal, Breeze-7B operates at twice the inference speed for Traditional Chinese to Mistral-7B and Llama 7B. [See [Inference Performance](#inference-performance).]
|
||||
- Breeze-7B-Instruct can be used as is for common tasks such as Q&A, RAG, multi-round chat, and summarization.
|
||||
- Breeze-7B-32k-Instruct can perform tasks at a document level (For Chinese, 20 ~ 40 pages).
|
||||
|
||||
*A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.*
|
||||
|
||||
## Features
|
||||
|
||||
- Breeze-7B-32k-Base-v1_0
|
||||
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
|
||||
- 32k-token context length
|
||||
|
||||
- Breeze-7B-32k-Instruct-v1_0
|
||||
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
|
||||
- 32k-token context length
|
||||
- Multi-turn dialogue (without special handling for harmfulness)
|
||||
|
||||
## Model Details
|
||||
|
||||
- Breeze-7B-32k-Base-v1_0
|
||||
- Pretrained from: [Breeze-7B-Base](https://huggingface.co/MediaTek-Research/Breeze-7B-Base-v1_0)
|
||||
- Model type: Causal decoder-only transformer language model
|
||||
- Language: English and Traditional Chinese (zh-tw)
|
||||
- Breeze-7B-32k-Instruct-v1_0
|
||||
- Finetuned from: [Breeze-7B-32k-Base](https://huggingface.co/MediaTek-Research/Breeze-7B-32k-Base-v1_0)
|
||||
- Model type: Causal decoder-only transformer language model
|
||||
- Language: English and Traditional Chinese (zh-tw)
|
||||
|
||||
## Long-context Performance
|
||||
|
||||
#### Needle-in-a-haystack Performance
|
||||
|
||||
We use the passkey retrieval task to test the model's ability to attend to different various depths in a given sequence.
|
||||
A key in placed within a long context distracting document for the model to retrieve.
|
||||
The key position is binned into 16 bins, and there are 20 testcases for each bin.
|
||||
Breeze-7B-32k-Base clears the tasks with 90+% accuracy, shown in the figure below.
|
||||

|
||||
|
||||
#### Long-DRCD Performance
|
||||
|
||||
| **Model/Performance(EM)** | **DRCD** | **DRCD-16k** | **DRCD-32k** |
|
||||
|---------------------------|----------|--------------|--------------|
|
||||
| **Breeze-7B-32k-Instruct-v1\_0** | 76.9 | 54.82 | 44.26 |
|
||||
| **Breeze-7B-32k-Base-v1\_0** | 79.73 | 69.68 | 61.55 |
|
||||
| **Breeze-7B-Base-v1\_0** | 80.61 | 21.79 | 15.29 |
|
||||
|
||||
#### Short-Benchmark Performance
|
||||
|
||||
| **Model/Performance(EM)** | **TMMLU+** | **MMLU** | **TABLE** | **MT-Bench-tw** | **MT-Bench** |
|
||||
|---------------------------|----------|--------------|--------------|-----|-----|
|
||||
| **Breeze-7B-32k-Instruct-v1\_0** | 41.37 | 61.34 | 34 | 5.8 | 7.4 |
|
||||
| **Breeze-7B-Instruct-v1\_0** | 42.67 | 62.73 | 39.58 | 6.0 | 7.4 |
|
||||
|
||||
## Use in Transformers
|
||||
|
||||
First, install direct dependencies:
|
||||
```
|
||||
pip install transformers torch accelerate
|
||||
```
|
||||
<p style="color:red;">Flash-attention2 is strongly recommended for long context scenarios.</p>
|
||||
|
||||
```bash
|
||||
pip install packaging ninja
|
||||
pip install flash-attn
|
||||
```
|
||||
Then load the model in transformers:
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"MediaTek-Research/Breeze-7B-32k-Base-v1_0",
|
||||
device_map="auto",
|
||||
torch_dtype=torch.bfloat16,
|
||||
attn_implementation="flash_attention_2" # optional but highly recommended
|
||||
)
|
||||
from transformers import AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained("MediaTek-Research/Breeze-7B-32k-Base-v1_0")
|
||||
tokenizer.tokenize("你好,我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如:回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。")
|
||||
# Tokenized results
|
||||
# ['▁', '你好', ',', '我', '可以', '幫助', '您', '解決', '各種', '問題', '、', '提供', '資訊', '和', '協助', '您', '完成', '許多', '不同', '的', '任務', '。', '例如', ':', '回答', '技術', '問題', '、', '提供', '建議', '、', '翻譯', '文字', '、', '尋找', '資料', '或', '協助', '您', '安排', '行程', '等', '。', '請', '告訴', '我', '如何', '能', '幫助', '您', '。']
|
||||
```
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```
|
||||
@article{MediaTek-Research2024breeze7b,
|
||||
title={Breeze-7B Technical Report},
|
||||
author={Chan-Jan Hsu and Chang-Le Liu and Feng-Ting Liao and Po-Chun Hsu and Yi-Chang Chen and Da-Shan Shiu},
|
||||
year={2024},
|
||||
eprint={2403.02712},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
```
|
||||
4
added_tokens.json
Normal file
4
added_tokens.json
Normal file
@@ -0,0 +1,4 @@
|
||||
{
|
||||
"<EOD>": 61873,
|
||||
"<PAD>": 61874
|
||||
}
|
||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 1000000.0,
|
||||
"sliding_window": 65536,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.37.2",
|
||||
"use_cache": true,
|
||||
"vocab_size": 62464
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.37.2"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7664752c98a6b0fa96298e32b638b3d2232fbb5c5311a592631098082f62a4b2
|
||||
size 14982620440
|
||||
BIN
needle-in-a-haystack-performance.png
Normal file
BIN
needle-in-a-haystack-performance.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 57 KiB |
24
special_tokens_map.json
Normal file
24
special_tokens_map.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "</s>",
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
140658
tokenizer.json
Normal file
140658
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9298e56c094f0d30431b0e52ad53287f0cadc99ac8ca17cc2144b0eb4753f130
|
||||
size 911034
|
||||
57
tokenizer_config.json
Normal file
57
tokenizer_config.json
Normal file
@@ -0,0 +1,57 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"61873": {
|
||||
"content": "<EOD>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"61874": {
|
||||
"content": "<PAD>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": "</s>",
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": false
|
||||
}
|
||||
Reference in New Issue
Block a user