初始化项目,由ModelHub XC社区提供模型
Model: nomadicsynth/neon-360-0.1 Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
80
README.md
Normal file
80
README.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
license: openrail
|
||||
datasets:
|
||||
- teknium/OpenHermes-2.5
|
||||
- wikimedia/wikipedia
|
||||
library_name: transformers
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- >-
|
||||
neoncortex/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
|
||||
---
|
||||
# Neon-360 v0.1
|
||||
|
||||
**Note:** This is not fully trained and will be replaced with either a fine-tuned version or a new model soon-ish. Don't expect anything useful from it rn. Only download it if you're curious.
|
||||
|
||||
I'm working on retraining this, trying to find out what can be achieved on consumer hardware, namely my RTX 4090. Hopefully i can make a tiny agentic model, maybe a nice fast one.
|
||||
Self-improvement? Can I teach it to make itself better?
|
||||
|
||||
**Suggestions wanted!**
|
||||
What tasks would you want from a tiny model? Let me know in the [Community Tab](https://huggingface.co/nomadicsynth/neon-360-0.1/discussions)
|
||||
|
||||
This is currently a copy of the below:
|
||||
|
||||
# mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
|
||||
|
||||
This repository contains the **mini-mistral-360M** model, a 360 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.
|
||||
|
||||
## Model Details
|
||||
|
||||
- **Architecture**: Mistral
|
||||
- **Parameters**: 360 million
|
||||
- **Training Duration**: 1 epoch
|
||||
- **Training Dataset**: Wikipedia articles and OpenHermes dataset
|
||||
- **Training Method**: Transformers Trainer with grokfast-adamw as the optimiser
|
||||
- **Training Hardware**: 2 x Nvidia RTX 3060 12GB
|
||||
|
||||
## Purpose
|
||||
|
||||
The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 360M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this model, you can load it with the `transformers` library from HuggingFace:
|
||||
|
||||
```python
|
||||
from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
|
||||
model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
|
||||
|
||||
# Example usage
|
||||
input_text = "Hello, world!"
|
||||
inputs = tokenizer(input_text, return_tensors="pt")
|
||||
outputs = model(**inputs)
|
||||
```
|
||||
|
||||
## Training Insights
|
||||
|
||||
This experiment was inspired by the paper ["Grokfast: Accelerated Grokking by Amplifying Slow Gradients" by Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee](https://arxiv.org/abs/2405.20233), aims to accelerate the generalization of models under the grokking phenomenon. The paper is available at https://arxiv.org/abs/2405.20233
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Special thanks to the YouTube channel [Tunadorable](https://youtube.com/@tunadorable) for bringing the Grokfast paper to my attention in his video ["Accelerated Training by Amplifying Slow Gradients"](https://youtu.be/__xQw60y200). Tunadorable reads and discusses AI papers from arXiv, providing valuable insights into the latest research.
|
||||
|
||||
## Disclaimer
|
||||
|
||||
This model is not optimized for practical use and should be considered experimental. It has only been trained for a single epoch, and its performance is not guaranteed to be reliable or accurate. Future iterations and more extensive training may improve its capabilities.
|
||||
|
||||
## Contributing
|
||||
|
||||
If you are interested in discussing, contributing or have any suggestions, please reach out or open an issue on the repository.
|
||||
|
||||
## License
|
||||
|
||||
This model is licensed under the OpenRAIL License.
|
||||
|
||||
---
|
||||
|
||||
Feel free to check out the model and experiment with it [here](https://huggingface.co/RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast). Your feedback and insights are welcome as I try and figure out wtf I'm doing.
|
||||
263
chat-template.jinja
Normal file
263
chat-template.jinja
Normal file
@@ -0,0 +1,263 @@
|
||||
{#- Default date variables. To improve UX pass the correct ones to the Jinja render. #}
|
||||
{%- if today is not defined %}
|
||||
{%- set today = '21-05-2026' %}
|
||||
{%- endif %}
|
||||
{%- if yesterday is not defined %}
|
||||
{%- set yesterday = '20-05-2026' %}
|
||||
{%- endif %}
|
||||
|
||||
{#- Default system message if no system prompt is passed. #}
|
||||
{%- set default_system_message -%}
|
||||
You are Neon 360 v0.1, a Large Language Model (LLM) created by Neon Cortex, an Aussie Dude with too much free time.
|
||||
You are an intelligent conversational assistant.
|
||||
Your knowledge base was last updated on *who knows?!*
|
||||
The current date is {{ today }}.
|
||||
|
||||
# GENERAL GUIDELINES
|
||||
|
||||
- Accurately answer the user's question.
|
||||
- For uncertain information or when the user's request requires up-to-date or specific data, use the available tools to fetch the information.
|
||||
- Be very attentive to dates, always try to resolve dates (e.g. "yesterday" is {{ yesterday }}) and when asked about information at specific dates, discard information that is at another date.
|
||||
|
||||
# WEB BROWSING INSTRUCTIONS
|
||||
|
||||
You cannot perform any web search or access internet to open URLs, links etc without dedicated tools.
|
||||
|
||||
# MULTI-MODAL INSTRUCTIONS
|
||||
|
||||
- You have the ability to read images.
|
||||
- You cannot read audio nor videos.
|
||||
- You cannot generate images without dedicated tools.
|
||||
|
||||
# TOOL CALLING INSTRUCTIONS
|
||||
|
||||
You may have access to tools that you can use to fetch information or perform actions. You must use these tools in the following situations:
|
||||
|
||||
1. When the request requires up-to-date information.
|
||||
2. When the request requires specific data that you do not have in your knowledge base.
|
||||
3. When the request involves actions that you cannot perform without tools.
|
||||
|
||||
Always prioritize using tools to provide the most accurate and helpful response.
|
||||
{%- endset %}
|
||||
|
||||
{#- Begin of sequence token. #}
|
||||
{{- '<s>' }}
|
||||
|
||||
|
||||
{#- Handle system prompt if it exists. #}
|
||||
{%- set loop_messages = messages %}
|
||||
{%- if messages[0]['role'] != 'system' and default_system_message != '' %}
|
||||
{{- '[SYSTEM_PROMPT]' + default_system_message + '[/SYSTEM_PROMPT]' }}
|
||||
{%- endif %}
|
||||
|
||||
|
||||
{#- Tools and model settings definition #}
|
||||
{%- set available_tools = '' %}
|
||||
{%- set has_tools = false %}
|
||||
{%- if tools is defined and tools is not none and tools|length > 0 %}
|
||||
{%- set has_tools = true %}
|
||||
{%- set available_tools = '[AVAILABLE_TOOLS]' + (tools| tojson) + '[/AVAILABLE_TOOLS]' %}
|
||||
{%- endif %}
|
||||
{%- if reasoning_effort is not defined or reasoning_effort is none %}
|
||||
{%- set reasoning_effort = 'none' %}
|
||||
{%- endif %}
|
||||
{%- if reasoning_effort not in ['none', 'high'] %}
|
||||
{{- raise_exception('reasoning_effort must be either "none" or "high"') }}
|
||||
{%- endif %}
|
||||
{%- set model_settings = '[MODEL_SETTINGS]{"reasoning_effort": "' + reasoning_effort + '"}[/MODEL_SETTINGS]' %}
|
||||
|
||||
{#- Aggregate consecutive messages with the same role except system and tool. #}
|
||||
{#- A sentinel message is appended so the last group gets flushed inside the loop. #}
|
||||
{%- set ns_agg = namespace(messages=[], current_group=[], current_role=none) %}
|
||||
{%- for message in loop_messages + [{'role': '__sentinel__'}] %}
|
||||
{%- if message['role'] != ns_agg.current_role or message['role'] == 'system' or message['role'] == 'tool' %}
|
||||
{%- if ns_agg.current_role == 'tool' %}
|
||||
{%- set ns_agg.messages = ns_agg.messages + ns_agg.current_group %}
|
||||
{%- elif ns_agg.current_role is not none %}
|
||||
{%- set ns_c = namespace(text_parts=[], chunks=[], has_non_text=false, tool_calls=[]) %}
|
||||
{%- for msg in ns_agg.current_group %}
|
||||
{#- Convert reasoning / reasoning_content to a leading thinking chunk. #}
|
||||
{%- set reasoning = msg.get('reasoning_content', msg.get('reasoning', none)) %}
|
||||
{%- if reasoning is not none and reasoning != '' %}
|
||||
{%- set think_chunk = {'type': 'thinking', 'thinking': reasoning} %}
|
||||
{%- if msg['content'] is string and msg['content'] != '' %}
|
||||
{%- set new_content = [think_chunk, {'type': 'text', 'text': msg['content']}] %}
|
||||
{%- elif msg['content'] is not none and msg['content'] is not string and msg['content'] | length > 0 %}
|
||||
{%- set new_content = [think_chunk] + msg['content'] | list %}
|
||||
{%- else %}
|
||||
{%- set new_content = [think_chunk] %}
|
||||
{%- endif %}
|
||||
{%- if msg['tool_calls'] is defined and msg['tool_calls'] is not none %}
|
||||
{%- set msg = {'role': msg['role'], 'content': new_content, 'tool_calls': msg['tool_calls']} %}
|
||||
{%- else %}
|
||||
{%- set msg = {'role': msg['role'], 'content': new_content} %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if msg['content'] is string %}
|
||||
{%- set ns_c.text_parts = ns_c.text_parts + [msg['content']] %}
|
||||
{%- elif msg['content'] is not none %}
|
||||
{%- for block in msg['content'] %}
|
||||
{%- if block['type'] == 'text' %}
|
||||
{%- set ns_c.text_parts = ns_c.text_parts + [block['text']] %}
|
||||
{%- else %}
|
||||
{%- if ns_c.text_parts | length > 0 %}
|
||||
{%- set ns_c.chunks = ns_c.chunks + [{'type': 'text', 'text': ns_c.text_parts | join('\n\n')}] %}
|
||||
{%- set ns_c.text_parts = [] %}
|
||||
{%- endif %}
|
||||
{%- set ns_c.chunks = ns_c.chunks + [block] %}
|
||||
{%- set ns_c.has_non_text = true %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{%- if msg['tool_calls'] is defined and msg['tool_calls'] is not none %}
|
||||
{%- set ns_c.tool_calls = ns_c.tool_calls + msg['tool_calls'] | list %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- if ns_c.has_non_text %}
|
||||
{%- if ns_c.text_parts | length > 0 %}
|
||||
{%- set ns_c.chunks = ns_c.chunks + [{'type': 'text', 'text': ns_c.text_parts | join('\n\n')}] %}
|
||||
{%- endif %}
|
||||
{%- set merged_content = ns_c.chunks %}
|
||||
{%- else %}
|
||||
{%- set merged_content = ns_c.text_parts | join('\n\n') %}
|
||||
{%- endif %}
|
||||
{%- if ns_c.tool_calls | length > 0 %}
|
||||
{%- set ns_agg.messages = ns_agg.messages + [{'role': ns_agg.current_role, 'content': merged_content, 'tool_calls': ns_c.tool_calls}] %}
|
||||
{%- else %}
|
||||
{%- set ns_agg.messages = ns_agg.messages + [{'role': ns_agg.current_role, 'content': merged_content}] %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- if message['role'] != '__sentinel__' %}
|
||||
{%- set ns_agg.current_group = [message] %}
|
||||
{%- set ns_agg.current_role = message['role'] %}
|
||||
{%- endif %}
|
||||
{%- else %}
|
||||
{%- set ns_agg.current_group = ns_agg.current_group + [message] %}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- set loop_messages = ns_agg.messages %}
|
||||
|
||||
{#- Validates message ordering. #}
|
||||
{%- set ns = namespace(available_tools_and_settings_emitted=false) %}
|
||||
{%- if loop_messages | length > 0 and loop_messages[0]['role'] != 'user' and loop_messages[0]['role'] != 'system' %}
|
||||
{{- raise_exception('Conversation must start with a user or system message, got ' + loop_messages[0]['role'] + '.') }}
|
||||
{%- endif %}
|
||||
{%- set ns_order = namespace(previous_role=none) %}
|
||||
{%- for message in loop_messages %}
|
||||
{%- set current_role = message['role'] %}
|
||||
{%- if ns_order.previous_role is not none %}
|
||||
{%- if ns_order.previous_role == 'system' %}
|
||||
{%- if current_role != 'user' and current_role != 'assistant' and current_role != 'system' %}
|
||||
{{- raise_exception('Unexpected role \'' + current_role + '\' after role \'' + ns_order.previous_role + '\'') }}
|
||||
{%- endif %}
|
||||
{%- elif ns_order.previous_role == 'user' %}
|
||||
{%- if current_role != 'assistant' and current_role != 'system' and current_role != 'user' %}
|
||||
{{- raise_exception('Unexpected role \'' + current_role + '\' after role \'' + ns_order.previous_role + '\'') }}
|
||||
{%- endif %}
|
||||
{%- elif ns_order.previous_role == 'assistant' %}
|
||||
{%- if current_role != 'assistant' and current_role != 'user' and current_role != 'tool' %}
|
||||
{{- raise_exception('Unexpected role \'' + current_role + '\' after role \'' + ns_order.previous_role + '\'') }}
|
||||
{%- endif %}
|
||||
{%- elif ns_order.previous_role == 'tool' %}
|
||||
{%- if current_role != 'assistant' and current_role != 'tool' and current_role != 'user' %}
|
||||
{{- raise_exception('Unexpected role \'' + current_role + '\' after role \'' + ns_order.previous_role + '\'') }}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- endif %}
|
||||
{%- set ns_order.previous_role = current_role %}
|
||||
{%- endfor %}
|
||||
|
||||
{#- Handle conversation messages. #}
|
||||
{%- for message in loop_messages %}
|
||||
{#- User messages supports text, image and image_url content. #}
|
||||
{%- if message['role'] == 'user' %}
|
||||
{%- if not ns.available_tools_and_settings_emitted %}
|
||||
{{- available_tools }}
|
||||
{{- model_settings }}
|
||||
{%- set ns.available_tools_and_settings_emitted = true %}
|
||||
{%- endif %}
|
||||
{%- if message['content'] is string %}
|
||||
{{- '[INST]' + message['content'] + '[/INST]' }}
|
||||
{%- elif message['content'] | length > 0 %}
|
||||
{{- '[INST]' }}
|
||||
{%- if message['content'] | length == 2 %}
|
||||
{%- set blocks = message['content'] | sort(attribute='type') %}
|
||||
{%- else %}
|
||||
{%- set blocks = message['content'] %}
|
||||
{%- endif %}
|
||||
{%- for block in blocks %}
|
||||
{%- if block['type'] == 'text' %}
|
||||
{{- block['text'] }}
|
||||
{%- elif block['type'] in ['image', 'image_url'] %}
|
||||
{{- '[IMG]' }}
|
||||
{%- else %}
|
||||
{{- raise_exception('Only text, image and image_url chunks are supported in user message content.') }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{{- '[/INST]' }}
|
||||
{%- else %}
|
||||
{{- raise_exception('User message must have a string or a list of chunks in content') }}
|
||||
{%- endif %}
|
||||
|
||||
{#- Assistant messages supports text and thinking content. #}
|
||||
{%- elif message['role'] == 'assistant' %}
|
||||
{%- if (message['content'] is none or message['content'] == '' or message['content']|length == 0) and (message['tool_calls'] is not defined or message['tool_calls'] is none or message['tool_calls']|length == 0) %}
|
||||
{{- raise_exception('Assistant message must have a string or a list of chunks in content or a list of tool calls.') }}
|
||||
{%- endif %}
|
||||
|
||||
{%- if message['content'] is string and message['content'] != '' %}
|
||||
{{- message['content'] }}
|
||||
{%- elif message['content'] | length > 0 %}
|
||||
{%- for block in message['content'] %}
|
||||
{%- if block['type'] == 'text' %}
|
||||
{{- block['text'] }}
|
||||
{%- elif block['type'] == 'thinking' %}
|
||||
{{- '[THINK]' + block['thinking'] }}
|
||||
{%- if block.get('closed', true) %}{{- '[/THINK]' }}{%- endif %}
|
||||
{%- else %}
|
||||
{{- raise_exception('Only text and thinking chunks are supported in assistant message contents.') }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
|
||||
{%- if message['tool_calls'] is defined and message['tool_calls'] is not none and message['tool_calls']|length > 0 %}
|
||||
{%- for tool in message['tool_calls'] %}
|
||||
{{- '[TOOL_CALLS]' }}
|
||||
{%- set name = tool['function']['name'] %}
|
||||
{%- set arguments = tool['function']['arguments'] %}
|
||||
{%- if arguments is not string %}
|
||||
{%- set arguments = arguments|tojson|safe %}
|
||||
{%- elif arguments == '' %}
|
||||
{%- set arguments = '{}' %}
|
||||
{%- endif %}
|
||||
{{- name + '[ARGS]' + arguments }}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
|
||||
{{- '</s>' }}
|
||||
|
||||
{#- Tool messages only supports text content. #}
|
||||
{%- elif message['role'] == 'tool' %}
|
||||
{{- '[TOOL_RESULTS]' + message['content']|string + '[/TOOL_RESULTS]' }}
|
||||
|
||||
{#- System messages. #}
|
||||
{%- elif message['role'] == 'system' %}
|
||||
{{- '[SYSTEM_PROMPT]' -}}
|
||||
{%- if message['content'] is string %}
|
||||
{{- message['content'] -}}
|
||||
{%- else %}
|
||||
{%- for block in message['content'] %}
|
||||
{%- if block['type'] == 'text' %}
|
||||
{{- block['text'] }}
|
||||
{%- else %}
|
||||
{{- raise_exception('Only text chunks are supported in system message contents.') }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
{%- endif %}
|
||||
{{- '[/SYSTEM_PROMPT]' -}}
|
||||
|
||||
{#- Raise exception for unsupported roles. #}
|
||||
{%- else %}
|
||||
{{- raise_exception('Only user, assistant, system and tool roles are supported, got ' + message['role'] + '.') }}
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
26
config.json
Normal file
26
config.json
Normal file
@@ -0,0 +1,26 @@
|
||||
{
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 1024,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 2048,
|
||||
"max_position_embeddings": 1024,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 16,
|
||||
"num_hidden_layers": 33,
|
||||
"num_key_value_heads": 4,
|
||||
"pad_token_id": 2,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_theta": 10000.0,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.42.0.dev0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 32016
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 2,
|
||||
"transformers_version": "4.42.0.dev0"
|
||||
}
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2252cfc9a24a41ad317afafbf15b9cb54d2f7289488b4e3516b2d9863e99f394
|
||||
size 719560040
|
||||
131
special_tokens_map.json
Normal file
131
special_tokens_map.json
Normal file
@@ -0,0 +1,131 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
{
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|named_user|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|named_assistant|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|mem_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|mem_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|pause|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_1|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_2|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_3|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_4|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_5|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_6|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_7|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
{
|
||||
"content": "<|spare_8|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "</s>",
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
91279
tokenizer.json
Normal file
91279
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
187
tokenizer_config.json
Normal file
187
tokenizer_config.json
Normal file
@@ -0,0 +1,187 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"add_prefix_space": null,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32000": {
|
||||
"content": "assistant",
|
||||
"lstrip": false,
|
||||
"normalized": true,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"32001": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32002": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32003": {
|
||||
"content": "<|named_user|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32004": {
|
||||
"content": "<|named_assistant|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32005": {
|
||||
"content": "<|mem_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32006": {
|
||||
"content": "<|mem_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32007": {
|
||||
"content": "<|pause|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32008": {
|
||||
"content": "<|spare_1|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32009": {
|
||||
"content": "<|spare_2|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32010": {
|
||||
"content": "<|spare_3|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32011": {
|
||||
"content": "<|spare_4|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32012": {
|
||||
"content": "<|spare_5|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32013": {
|
||||
"content": "<|spare_6|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32014": {
|
||||
"content": "<|spare_7|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"32015": {
|
||||
"content": "<|spare_8|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|named_user|>",
|
||||
"<|named_assistant|>",
|
||||
"<|mem_start|>",
|
||||
"<|mem_end|>",
|
||||
"<|pause|>",
|
||||
"<|spare_1|>",
|
||||
"<|spare_2|>",
|
||||
"<|spare_3|>",
|
||||
"<|spare_4|>",
|
||||
"<|spare_5|>",
|
||||
"<|spare_6|>",
|
||||
"<|spare_7|>",
|
||||
"<|spare_8|>"
|
||||
],
|
||||
"bos_token": "<s>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"legacy": true,
|
||||
"model_max_length": 1024,
|
||||
"pad_token": "</s>",
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": false
|
||||
}
|
||||
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0150025d1ee2d8a2a7f58c24dfd7f3898429f53699ef31506f33271c31de4401
|
||||
size 5752
|
||||
Reference in New Issue
Block a user