初始化项目,由ModelHub XC社区提供模型
Model: mayacinka/chatty-djinn-14B Source: Original Platform
This commit is contained in:
312
README.md
Normal file
312
README.md
Normal file
@@ -0,0 +1,312 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- merge
|
||||
- mergekit
|
||||
- lazymergekit
|
||||
- openchat/openchat-3.5-0106
|
||||
- teknium/OpenHermes-2.5-Mistral-7B
|
||||
base_model:
|
||||
- openchat/openchat-3.5-0106
|
||||
- teknium/OpenHermes-2.5-Mistral-7B
|
||||
model-index:
|
||||
- name: chatty-djinn-14B
|
||||
results:
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: AI2 Reasoning Challenge (25-Shot)
|
||||
type: ai2_arc
|
||||
config: ARC-Challenge
|
||||
split: test
|
||||
args:
|
||||
num_few_shot: 25
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 70.39
|
||||
name: normalized accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: HellaSwag (10-Shot)
|
||||
type: hellaswag
|
||||
split: validation
|
||||
args:
|
||||
num_few_shot: 10
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 86.45
|
||||
name: normalized accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: MMLU (5-Shot)
|
||||
type: cais/mmlu
|
||||
config: all
|
||||
split: test
|
||||
args:
|
||||
num_few_shot: 5
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 64.4
|
||||
name: accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: TruthfulQA (0-shot)
|
||||
type: truthful_qa
|
||||
config: multiple_choice
|
||||
split: validation
|
||||
args:
|
||||
num_few_shot: 0
|
||||
metrics:
|
||||
- type: mc2
|
||||
value: 67.57
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: Winogrande (5-shot)
|
||||
type: winogrande
|
||||
config: winogrande_xl
|
||||
split: validation
|
||||
args:
|
||||
num_few_shot: 5
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 83.11
|
||||
name: accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: GSM8k (5-shot)
|
||||
type: gsm8k
|
||||
config: main
|
||||
split: test
|
||||
args:
|
||||
num_few_shot: 5
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 60.58
|
||||
name: accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mayacinka/chatty-djinn-14B
|
||||
name: Open LLM Leaderboard
|
||||
---
|
||||

|
||||
# djinn
|
||||
|
||||
|
||||
djinn is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
||||
* [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106)
|
||||
* [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)
|
||||
* bardsai/jaskier-7b-dpo-v6.1
|
||||
* senseable/WestLake-7B-v2
|
||||
* NousResearch/Nous-Hermes-2-Mistral-7B-DPO
|
||||
* paulml/OGNO-7B
|
||||
* paulml/DPOB-INMTOB-7B
|
||||
* mlabonne/AlphaMonarch-7B
|
||||
|
||||
# 🏆 Benchmarks
|
||||
Nous benchmarks, find more [details here](https://gist.github.com/majacinka/3f2a797c8872ca9bfdaa2bbf3369edb5)
|
||||
|
||||
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
||||
|---------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
||||
|[chatty-djinn-14B](https://huggingface.co/mayacinka/chatty-djinn-14B)| 38.43| 76.29| 68.02| 47.6| 57.59|
|
||||
|
||||
### AGIEval
|
||||
| Task |Version| Metric |Value| |Stderr|
|
||||
|------------------------------|------:|--------|----:|---|-----:|
|
||||
|agieval_aqua_rat | 0|acc |23.62|± | 2.67|
|
||||
| | |acc_norm|21.65|± | 2.59|
|
||||
|agieval_logiqa_en | 0|acc |32.26|± | 1.83|
|
||||
| | |acc_norm|33.79|± | 1.86|
|
||||
|agieval_lsat_ar | 0|acc |23.04|± | 2.78|
|
||||
| | |acc_norm|23.04|± | 2.78|
|
||||
|agieval_lsat_lr | 0|acc |38.82|± | 2.16|
|
||||
| | |acc_norm|39.22|± | 2.16|
|
||||
|agieval_lsat_rc | 0|acc |59.48|± | 3.00|
|
||||
| | |acc_norm|54.65|± | 3.04|
|
||||
|agieval_sat_en | 0|acc |75.73|± | 2.99|
|
||||
| | |acc_norm|74.27|± | 3.05|
|
||||
|agieval_sat_en_without_passage| 0|acc |35.92|± | 3.35|
|
||||
| | |acc_norm|34.47|± | 3.32|
|
||||
|agieval_sat_math | 0|acc |31.36|± | 3.14|
|
||||
| | |acc_norm|26.36|± | 2.98|
|
||||
|
||||
Average: 38.43%
|
||||
|
||||
### GPT4All
|
||||
| Task |Version| Metric |Value| |Stderr|
|
||||
|-------------|------:|--------|----:|---|-----:|
|
||||
|arc_challenge| 0|acc |62.12|± | 1.42|
|
||||
| | |acc_norm|65.44|± | 1.39|
|
||||
|arc_easy | 0|acc |83.88|± | 0.75|
|
||||
| | |acc_norm|78.58|± | 0.84|
|
||||
|boolq | 1|acc |88.07|± | 0.57|
|
||||
|hellaswag | 0|acc |65.18|± | 0.48|
|
||||
| | |acc_norm|86.45|± | 0.34|
|
||||
|openbookqa | 0|acc |39.60|± | 2.19|
|
||||
| | |acc_norm|48.60|± | 2.24|
|
||||
|piqa | 0|acc |82.26|± | 0.89|
|
||||
| | |acc_norm|83.62|± | 0.86|
|
||||
|winogrande | 0|acc |83.27|± | 1.05|
|
||||
|
||||
Average: 76.29%
|
||||
|
||||
### TruthfulQA
|
||||
| Task |Version|Metric|Value| |Stderr|
|
||||
|-------------|------:|------|----:|---|-----:|
|
||||
|truthfulqa_mc| 1|mc1 |50.55|± | 1.75|
|
||||
| | |mc2 |68.02|± | 1.52|
|
||||
|
||||
Average: 68.02%
|
||||
|
||||
### Bigbench
|
||||
| Task |Version| Metric |Value| |Stderr|
|
||||
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
||||
|bigbench_causal_judgement | 0|multiple_choice_grade|57.89|± | 3.59|
|
||||
|bigbench_date_understanding | 0|multiple_choice_grade|64.50|± | 2.49|
|
||||
|bigbench_disambiguation_qa | 0|multiple_choice_grade|32.56|± | 2.92|
|
||||
|bigbench_geometric_shapes | 0|multiple_choice_grade|26.18|± | 2.32|
|
||||
| | |exact_str_match | 1.11|± | 0.55|
|
||||
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.80|± | 2.07|
|
||||
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|22.86|± | 1.59|
|
||||
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|57.67|± | 2.86|
|
||||
|bigbench_movie_recommendation | 0|multiple_choice_grade|62.00|± | 2.17|
|
||||
|bigbench_navigate | 0|multiple_choice_grade|56.20|± | 1.57|
|
||||
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|65.65|± | 1.06|
|
||||
|bigbench_ruin_names | 0|multiple_choice_grade|64.73|± | 2.26|
|
||||
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|17.33|± | 1.20|
|
||||
|bigbench_snarks | 0|multiple_choice_grade|76.24|± | 3.17|
|
||||
|bigbench_sports_understanding | 0|multiple_choice_grade|75.15|± | 1.38|
|
||||
|bigbench_temporal_sequences | 0|multiple_choice_grade|48.90|± | 1.58|
|
||||
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.32|± | 1.18|
|
||||
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.17|± | 0.92|
|
||||
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|57.67|± | 2.86|
|
||||
|
||||
Average: 47.6%
|
||||
|
||||
Average score: 57.59%
|
||||
## 🧩 Configuration
|
||||
Inspired by [theprofessor's config](https://huggingface.co/abacusai/TheProfessor-155b)
|
||||
|
||||
```yaml
|
||||
merge_method: linear # use linear so we can include multiple models, albeit at a zero weight
|
||||
parameters:
|
||||
weight: 1.0 # weight everything as 1 unless specified otherwise - linear with one model weighted at 1 is a no-op like passthrough
|
||||
slices:
|
||||
- sources:
|
||||
- model: openchat/openchat-3.5-0106
|
||||
layer_range: [0, 1]
|
||||
- model: teknium/OpenHermes-2.5-Mistral-7B
|
||||
layer_range: [0, 1]
|
||||
parameters:
|
||||
weight: 0
|
||||
- sources:
|
||||
- model: bardsai/jaskier-7b-dpo-v6.1
|
||||
layer_range: [1, 10]
|
||||
- sources:
|
||||
- model: senseable/WestLake-7B-v2
|
||||
layer_range: [10, 20]
|
||||
- sources:
|
||||
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
|
||||
layer_range: [20, 30]
|
||||
- sources:
|
||||
- model: paulml/OGNO-7B
|
||||
layer_range: [15, 25]
|
||||
- sources:
|
||||
- model: paulml/DPOB-INMTOB-7B
|
||||
layer_range: [22, 32]
|
||||
- sources:
|
||||
- model: mlabonne/AlphaMonarch-7B
|
||||
layer_range: [5, 15]
|
||||
- sources:
|
||||
- model: openchat/openchat-3.5-0106
|
||||
layer_range: [31, 32]
|
||||
- model: teknium/OpenHermes-2.5-Mistral-7B
|
||||
layer_range: [31, 32]
|
||||
parameters:
|
||||
weight: 0
|
||||
dtype: float16
|
||||
tokenizer_source: model:openchat/openchat-3.5-0106
|
||||
```
|
||||
|
||||
## An example
|
||||
|
||||
*User:* Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?
|
||||
Think step by step and the result MUST BE A NUMBER
|
||||
|
||||
*djinn:*
|
||||
In this family structure, we know there are 4 aka '1 + 3-brothers' (1 being initially for Sally) existing individuals who are
|
||||
also the siblings of the 3-brother-group. As each of the 3-brothers has 2-sisters each, that sums up to 6 total sister- roles. Since
|
||||
we already accounted for one of the sisters (assuming to be the oldest or in alphabetical order, as names were not given) to be part
|
||||
of the 3-brother group, we subtract one from the 6-sisters total. So, considering the provided information, Sally would have 6 - 1 = 5
|
||||
remaining younger or alphabetically-wise- placed, biological-sisters. However, note that this problem statement might not perfectly align
|
||||
with real-life complexities, such as same-sex twins, adoption, etc., but within the given context, the solution is 5.
|
||||
|
||||
Remember, in more intricate or unclear family tree scenarios, it'd be best to directly ask for clarifications or consider additional data
|
||||
points. In short, with the given constraints, Sally has 5 known, direct, biological, and likely-in-age-order, younger/alphabetically-placed,
|
||||
sister-siblings.(To reiterate the core result in a concise manner: 5).
|
||||
|
||||
(Please excuse the verbosity, aiming for thoroughness and clarity.)
|
||||
|
||||
This response maintains the initial request's numerical focus while providing a detailed explanation to ensure understanding.
|
||||
|
||||
## 💻 Usage
|
||||
|
||||
```python
|
||||
!pip install -qU transformers accelerate
|
||||
|
||||
from transformers import AutoTokenizer
|
||||
import transformers
|
||||
import torch
|
||||
|
||||
model = "mayacinka/djinn"
|
||||
messages = [{"role": "user", "content": "What is a large language model?"}]
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model)
|
||||
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
pipeline = transformers.pipeline(
|
||||
"text-generation",
|
||||
model=model,
|
||||
torch_dtype=torch.float16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
||||
print(outputs[0]["generated_text"])
|
||||
```
|
||||
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
||||
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mayacinka__chatty-djinn-14B)
|
||||
|
||||
| Metric |Value|
|
||||
|---------------------------------|----:|
|
||||
|Avg. |72.08|
|
||||
|AI2 Reasoning Challenge (25-Shot)|70.39|
|
||||
|HellaSwag (10-Shot) |86.45|
|
||||
|MMLU (5-Shot) |64.40|
|
||||
|TruthfulQA (0-shot) |67.57|
|
||||
|Winogrande (5-shot) |83.11|
|
||||
|GSM8k (5-shot) |60.58|
|
||||
|
||||
Reference in New Issue
Block a user