初始化项目,由ModelHub XC社区提供模型

Model: Vezora/Narwhal-7b-v3
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-14 06:42:17 +08:00
commit 55c8884c68
9 changed files with 91413 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

161
README.md Normal file
View File

@@ -0,0 +1,161 @@
---
license: apache-2.0
---
This is a merge model using Tie merge method.
Created using openchat 3.5 and una-cybertron-7b-v2-bf16.
Instruction template:
```python
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("openchat/openchat_3.5")
# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Coding Mode
tokens = tokenizer("Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:").input_ids
assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]
```
This model is exceptionally well at labeling data, bringing down labeling cost to server cost. Hurray! Here is an example
User:
This is raw text, i want you to label it. in a json format.
text:
"
To build high-performing machine learning models, you need high quality data. Getting ahold of this data can be costly, complex, and inefficient. Labels created by people are needed for the majority of models to help them generate the correct predictions. To help streamline and automate this process, you can apply a machine learning model to label the data directly.
Firstly, a machine learning model is trained on a subset of raw training data that has already been labeled by humans. A model with a track record of producing precise outcomes from the information that it has learned thus far, can add labels to unlabeled data automatically. A less accurate model requires human annotators to add labels. Labels created by humans enable the model to learn and enhance its capacity to categorize new data.
Eventually, the model is able to label an increasing amount of data automatically and speed up the creation of training datasets. Of course, implementing quality control in such models is also a necessity, as with time it might drift and start producing less accurate results. In this case, human annotators can step in again.
Internal labeling (in-house), synthetic labeling (generating new data from previous datasets), programmatic labeling (using scripts), outsourcing (or freelancing) constitute a variety of data labeling methods. However, our favorite is obviously crowdsourcing — a great way to outsource data labeling and get around the drawn-out and expensive management processes. Check out our data labeling platform to learn more!"
Narwhal-7b-v3:
Here is the given text with added JSON formatted labels. Each sentence has been assigned a label indicating whether it expresses a positive, negative, or neutral sentiment:
{
"text": [
{
"content": "To build high-performing machine learning models, you need high quality data.",
"label": "neutral"
},
{
"content": "Getting ahold of this data can be costly, complex, and inefficient.",
"label": "negative"
},
{
"content": "Labels created by people are needed for the majority of models to help them generate the correct predictions.",
"label": "neutral"
},
{
"content": "To help streamline and automate this process, you can apply a machine learning model to label the data directly.",
"label": "positive"
},
{
"content": "Firstly, a machine learning model is trained on a subset of raw training data that has already been labeled by humans.",
"label": "neutral"
},
{
"content": "A model with a track record of producing precise outcomes from the information that it has learned thus far, can add labels to unlabeled data automatically.",
"label": "positive"
},
{
"content": "A less accurate model requires human annotators to add labels.",
"label": "neutral"
},
{
"content": "Labels created by humans enable the model to learn and enhance its capacity to categorize new data.",
"label": "positive"
},
{
"content": "Eventually, the model is able to label an increasing amount of data automatically and speed up the creation of training datasets.",
"label": "positive"
},
{
"content": "Of course, implementing quality control in such models is also a necessity, as with time it might drift and start producing less accurate results.",
"label": "neutral"
},
{
"content": "In this case, human annotators can step in again.",
"label": "neutral"
},
{
"content": "Internal labeling (in-house), synthetic labeling (generating new data from previous datasets), programmatic labeling (using scripts), outsourcing (or freelancing) constitute a variety of data labeling methods.",
"label": "neutral"
},
{
"content": "However, our favorite is obviously crowdsourcing — a great way to outsource data labeling and get around the drawn-out and expensive management processes.",
"label": "positive"
},
{
"content": "Check out our data labeling platform to learn more!",
"label": "neutral"
}
]
}
This JSON object now represents the original text with labeled sentiments for each sentence. You can further utilize this structured data to train sentiment analysis models or perform various analyses.

25
config.json Normal file
View File

@@ -0,0 +1,25 @@
{
"_name_or_path": "C:\\Users\\PC\\Documents\\text-generation-webui\\models\\Mistral-7B-v0.1",
"architectures": [
"MistralForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 10000.0,
"sliding_window": 4096,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.35.0",
"use_cache": true,
"vocab_size": 32000
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cb142e153f1b6508a262f17c02f170ab1668d434621848971174cd9508affa1d
size 9942981696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:124ccedefad9c5ca9f8047adc04d020069a2b004e3219e24ae15d8e3df7d2051
size 4540516344

File diff suppressed because one or more lines are too long

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

91122
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

40
tokenizer_config.json Normal file
View File

@@ -0,0 +1,40 @@
{
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [],
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": true
}