初始化项目,由ModelHub XC社区提供模型
Model: Vezora/Narwhal-7b-v3 Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
161
README.md
Normal file
161
README.md
Normal file
@@ -0,0 +1,161 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
---
|
||||
This is a merge model using Tie merge method.
|
||||
Created using openchat 3.5 and una-cybertron-7b-v2-bf16.
|
||||
|
||||
Instruction template:
|
||||
```python
|
||||
import transformers
|
||||
tokenizer = transformers.AutoTokenizer.from_pretrained("openchat/openchat_3.5")
|
||||
|
||||
# Single-turn
|
||||
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
|
||||
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
|
||||
|
||||
# Multi-turn
|
||||
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
|
||||
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
|
||||
|
||||
# Coding Mode
|
||||
tokens = tokenizer("Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:").input_ids
|
||||
assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]
|
||||
```
|
||||
|
||||
|
||||
This model is exceptionally well at labeling data, bringing down labeling cost to server cost. Hurray! Here is an example
|
||||
|
||||
User:
|
||||
This is raw text, i want you to label it. in a json format.
|
||||
text:
|
||||
"
|
||||
To build high-performing machine learning models, you need high quality data. Getting ahold of this data can be costly, complex, and inefficient. Labels created by people are needed for the majority of models to help them generate the correct predictions. To help streamline and automate this process, you can apply a machine learning model to label the data directly.
|
||||
|
||||
Firstly, a machine learning model is trained on a subset of raw training data that has already been labeled by humans. A model with a track record of producing precise outcomes from the information that it has learned thus far, can add labels to unlabeled data automatically. A less accurate model requires human annotators to add labels. Labels created by humans enable the model to learn and enhance its capacity to categorize new data.
|
||||
|
||||
Eventually, the model is able to label an increasing amount of data automatically and speed up the creation of training datasets. Of course, implementing quality control in such models is also a necessity, as with time it might drift and start producing less accurate results. In this case, human annotators can step in again.
|
||||
|
||||
Internal labeling (in-house), synthetic labeling (generating new data from previous datasets), programmatic labeling (using scripts), outsourcing (or freelancing) constitute a variety of data labeling methods. However, our favorite is obviously crowdsourcing — a great way to outsource data labeling and get around the drawn-out and expensive management processes. Check out our data labeling platform to learn more!"
|
||||
|
||||
Narwhal-7b-v3:
|
||||
Here is the given text with added JSON formatted labels. Each sentence has been assigned a label indicating whether it expresses a positive, negative, or neutral sentiment:
|
||||
|
||||
{
|
||||
|
||||
"text": [
|
||||
{
|
||||
|
||||
"content": "To build high-performing machine learning models, you need high quality data.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Getting ahold of this data can be costly, complex, and inefficient.",
|
||||
|
||||
"label": "negative"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Labels created by people are needed for the majority of models to help them generate the correct predictions.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "To help streamline and automate this process, you can apply a machine learning model to label the data directly.",
|
||||
|
||||
"label": "positive"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Firstly, a machine learning model is trained on a subset of raw training data that has already been labeled by humans.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "A model with a track record of producing precise outcomes from the information that it has learned thus far, can add labels to unlabeled data automatically.",
|
||||
|
||||
"label": "positive"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "A less accurate model requires human annotators to add labels.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Labels created by humans enable the model to learn and enhance its capacity to categorize new data.",
|
||||
|
||||
"label": "positive"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Eventually, the model is able to label an increasing amount of data automatically and speed up the creation of training datasets.",
|
||||
|
||||
"label": "positive"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Of course, implementing quality control in such models is also a necessity, as with time it might drift and start producing less accurate results.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "In this case, human annotators can step in again.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Internal labeling (in-house), synthetic labeling (generating new data from previous datasets), programmatic labeling (using scripts), outsourcing (or freelancing) constitute a variety of data labeling methods.",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "However, our favorite is obviously crowdsourcing — a great way to outsource data labeling and get around the drawn-out and expensive management processes.",
|
||||
|
||||
"label": "positive"
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
|
||||
"content": "Check out our data labeling platform to learn more!",
|
||||
|
||||
"label": "neutral"
|
||||
|
||||
}
|
||||
]
|
||||
|
||||
}
|
||||
|
||||
This JSON object now represents the original text with labeled sentiments for each sentence. You can further utilize this structured data to train sentiment analysis models or perform various analyses.
|
||||
25
config.json
Normal file
25
config.json
Normal file
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"_name_or_path": "C:\\Users\\PC\\Documents\\text-generation-webui\\models\\Mistral-7B-v0.1",
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 10000.0,
|
||||
"sliding_window": 4096,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.35.0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32000
|
||||
}
|
||||
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cb142e153f1b6508a262f17c02f170ab1668d434621848971174cd9508affa1d
|
||||
size 9942981696
|
||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:124ccedefad9c5ca9f8047adc04d020069a2b004e3219e24ae15d8e3df7d2051
|
||||
size 4540516344
|
||||
1
model.safetensors.index.json
Normal file
1
model.safetensors.index.json
Normal file
File diff suppressed because one or more lines are too long
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
91122
tokenizer.json
Normal file
91122
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
40
tokenizer_config.json
Normal file
40
tokenizer_config.json
Normal file
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [],
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": null,
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": true
|
||||
}
|
||||
Reference in New Issue
Block a user