初始化项目,由ModelHub XC社区提供模型
Model: NousResearch/Nous-Hermes-llama-2-7b Source: Original Platform
This commit is contained in:
37
.gitattributes
vendored
Normal file
37
.gitattributes
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
|
||||
158
README.md
Normal file
158
README.md
Normal file
@@ -0,0 +1,158 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- llama-2
|
||||
- self-instruct
|
||||
- distillation
|
||||
- synthetic instruction
|
||||
license:
|
||||
- mit
|
||||
new_version: NousResearch/Hermes-3-Llama-3.1-8B
|
||||
---
|
||||
|
||||
# Model Card: Nous-Hermes-Llama2-7b
|
||||
|
||||
Compute provided by our project sponsor Redmond AI, thank you! Follow RedmondAI on Twitter @RedmondAI.
|
||||
|
||||
## Model Description
|
||||
|
||||
Nous-Hermes-Llama2-7b is a state-of-the-art language model fine-tuned on over 300,000 instructions. This model was fine-tuned by Nous Research, with Teknium leading the fine tuning process and dataset curation, Redmond AI sponsoring the compute, and several other contributors.
|
||||
|
||||
This Hermes model uses the exact same dataset as Hermes on Llama-1. This is to ensure consistency between the old Hermes and new, for anyone who wanted to keep Hermes as similar to the old one, just more capable.
|
||||
|
||||
This model stands out for its long responses, lower hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 4096 sequence length on an 8x a100 80GB DGX machine.
|
||||
|
||||
|
||||
## Model Training
|
||||
|
||||
The model was trained almost entirely on synthetic GPT-4 outputs. Curating high quality GPT-4 datasets enables incredibly high quality in knowledge, task completion, and style.
|
||||
|
||||
This includes data from diverse sources such as GPTeacher, the general, roleplay v1&2, code instruct datasets, Nous Instruct & PDACTL (unpublished), and several others, detailed further below
|
||||
|
||||
## Collaborators
|
||||
The model fine-tuning and the datasets were a collaboration of efforts and resources between Teknium, Karan4D, Emozilla, Huemin Art and Redmond AI.
|
||||
|
||||
Special mention goes to @winglian for assisting in some of the training issues.
|
||||
|
||||
Huge shoutout and acknowledgement is deserved for all the dataset creators who generously share their datasets openly.
|
||||
|
||||
Among the contributors of datasets:
|
||||
- GPTeacher was made available by Teknium
|
||||
- Wizard LM by nlpxucan
|
||||
- Nous Research Instruct Dataset was provided by Karan4D and HueminArt.
|
||||
- GPT4-LLM and Unnatural Instructions were provided by Microsoft
|
||||
- Airoboros dataset by jondurbin
|
||||
- Camel-AI's domain expert datasets are from Camel-AI
|
||||
- CodeAlpaca dataset by Sahil 2801.
|
||||
|
||||
If anyone was left out, please open a thread in the community tab.
|
||||
|
||||
## Prompt Format
|
||||
|
||||
The model follows the Alpaca prompt format:
|
||||
```
|
||||
### Instruction:
|
||||
<prompt>
|
||||
|
||||
### Response:
|
||||
<leave a newline blank for model to respond>
|
||||
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```
|
||||
### Instruction:
|
||||
<prompt>
|
||||
|
||||
### Input:
|
||||
<additional context>
|
||||
|
||||
### Response:
|
||||
<leave a newline blank for model to respond>
|
||||
```
|
||||
|
||||
GPT4All:
|
||||
```| Task |Version| Metric |Value | |Stderr|
|
||||
|-------------|------:|--------|-----:|---|-----:|
|
||||
|arc_challenge| 0|acc |0.4735|± |0.0146|
|
||||
| | |acc_norm|0.5017|± |0.0146|
|
||||
|arc_easy | 0|acc |0.7946|± |0.0083|
|
||||
| | |acc_norm|0.7605|± |0.0088|
|
||||
|boolq | 1|acc |0.8000|± |0.0070|
|
||||
|hellaswag | 0|acc |0.5924|± |0.0049|
|
||||
| | |acc_norm|0.7774|± |0.0042|
|
||||
|openbookqa | 0|acc |0.3600|± |0.0215|
|
||||
| | |acc_norm|0.4660|± |0.0223|
|
||||
|piqa | 0|acc |0.7889|± |0.0095|
|
||||
| | |acc_norm|0.7976|± |0.0094|
|
||||
|winogrande | 0|acc |0.6993|± |0.0129|
|
||||
Average: 0.686
|
||||
```
|
||||
|
||||
BigBench:
|
||||
```
|
||||
| Task |Version| Metric |Value | |Stderr|
|
||||
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
||||
|bigbench_causal_judgement | 0|multiple_choice_grade|0.5579|± |0.0361|
|
||||
|bigbench_date_understanding | 0|multiple_choice_grade|0.6233|± |0.0253|
|
||||
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3062|± |0.0288|
|
||||
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|± |0.0212|
|
||||
| | |exact_str_match |0.0000|± |0.0000|
|
||||
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2540|± |0.0195|
|
||||
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.1657|± |0.0141|
|
||||
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4067|± |0.0284|
|
||||
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.2780|± |0.0201|
|
||||
|bigbench_navigate | 0|multiple_choice_grade|0.5000|± |0.0158|
|
||||
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.4405|± |0.0111|
|
||||
|bigbench_ruin_names | 0|multiple_choice_grade|0.2701|± |0.0210|
|
||||
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2034|± |0.0127|
|
||||
|bigbench_snarks | 0|multiple_choice_grade|0.5028|± |0.0373|
|
||||
|bigbench_sports_understanding | 0|multiple_choice_grade|0.6136|± |0.0155|
|
||||
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.2720|± |0.0141|
|
||||
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.1944|± |0.0112|
|
||||
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1497|± |0.0085|
|
||||
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4067|± |0.0284|
|
||||
Average: 0.3525
|
||||
```
|
||||
|
||||
AGIEval
|
||||
```
|
||||
| Task |Version| Metric |Value | |Stderr|
|
||||
|------------------------------|------:|--------|-----:|---|-----:|
|
||||
|agieval_aqua_rat | 0|acc |0.2520|± |0.0273|
|
||||
| | |acc_norm|0.2402|± |0.0269|
|
||||
|agieval_logiqa_en | 0|acc |0.2796|± |0.0176|
|
||||
| | |acc_norm|0.3241|± |0.0184|
|
||||
|agieval_lsat_ar | 0|acc |0.2478|± |0.0285|
|
||||
| | |acc_norm|0.2348|± |0.0280|
|
||||
|agieval_lsat_lr | 0|acc |0.2843|± |0.0200|
|
||||
| | |acc_norm|0.2765|± |0.0198|
|
||||
|agieval_lsat_rc | 0|acc |0.3271|± |0.0287|
|
||||
| | |acc_norm|0.3011|± |0.0280|
|
||||
|agieval_sat_en | 0|acc |0.4660|± |0.0348|
|
||||
| | |acc_norm|0.4223|± |0.0345|
|
||||
|agieval_sat_en_without_passage| 0|acc |0.3738|± |0.0338|
|
||||
| | |acc_norm|0.3447|± |0.0332|
|
||||
|agieval_sat_math | 0|acc |0.2500|± |0.0293|
|
||||
| | |acc_norm|0.2364|± |0.0287|
|
||||
Average: 0.2975
|
||||
```
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
|
||||
|
||||
|
||||
## Resources for Applied Use Cases:
|
||||
For an example of a back and forth chatbot using huggingface transformers and discord, check out: https://github.com/teknium1/alpaca-discord
|
||||
For an example of a roleplaying discord chatbot, check out this: https://github.com/teknium1/alpaca-roleplay-discordbot
|
||||
|
||||
LM Studio is a good choice for a chat interface that supports GGML versions (to come)
|
||||
|
||||
## Future Plans
|
||||
We plan to continue to iterate on both more high quality data, and new data filtering techniques to eliminate lower quality data going forward.
|
||||
|
||||
## Model Usage
|
||||
The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
|
||||
3
added_tokens.json
Normal file
3
added_tokens.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"<pad>": 32000
|
||||
}
|
||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"_name_or_path": "output/hermes-llama2-4k/checkpoint-2259",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 11008,
|
||||
"max_position_embeddings": 4096,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 32,
|
||||
"pad_token_id": 0,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.32.0.dev0",
|
||||
"use_cache": false,
|
||||
"vocab_size": 32000
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
9
generation_config.json
Normal file
9
generation_config.json
Normal file
@@ -0,0 +1,9 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 0,
|
||||
"temperature": 0.9,
|
||||
"top_p": 0.6,
|
||||
"transformers_version": "4.32.0.dev0"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:23d88a682b353e48ccedfcb63b9f58ffa634267d2e4311079fe1fa3e53161219
|
||||
size 13476865232
|
||||
3
pytorch_model.bin
Normal file
3
pytorch_model.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fbf5845c2a382e91ccb844d39463d118b1edbb749dfcf87beee8c20e8cd49d3b
|
||||
size 13476978469
|
||||
24
special_tokens_map.json
Normal file
24
special_tokens_map.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "<unk>",
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
93400
tokenizer.json
Normal file
93400
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
Binary file not shown.
32
tokenizer_config.json
Normal file
32
tokenizer_config.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"legacy": false,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": null,
|
||||
"sp_model_kwargs": {},
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": {
|
||||
"__type": "AddedToken",
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user