初始化项目,由ModelHub XC社区提供模型

Model: Nexusflow/NexusRaven-13B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-23 08:26:13 +08:00
commit 610836b00e
21 changed files with 94723 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
NexusRaven.png filter=lfs diff=lfs merge=lfs -text
pytorch_model-00001-of-00003.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00002-of-00003.bin filter=lfs diff=lfs merge=lfs -text
pytorch_model-00003-of-00003.bin filter=lfs diff=lfs merge=lfs -text

3
NexusRaven.png Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:471f80e524e4d6975700113e2a1564e1d727fed3e3e1aeea928ca95d523782d3
size 1102527

185
README.md Normal file
View File

@@ -0,0 +1,185 @@
---
license: llama2
base_model: codellama/CodeLlama-13b-Instruct-hf
model-index:
- name: NexusRaven-13B
results: []
---
# NexusRaven-13B: Surpassing the state-of-the-art in open-source function calling LLMs.
<p align="center">
<a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="http://nexusflow.ai/blog" target="_blank">NexusRaven blog post</a> - <a href="https://huggingface.co/Nexusflow/NexusRaven-13B" target="_blank">NexusRaven-13B</a> - <a href="https://x.com/NexusflowX/status/1707470614012035561?s=20" target="_blank">NexusRaven-13B Twitter Thread</a> - <a href="https://github.com/nexusflowai/NexusRaven/" target="_blank">NexusRaven-13B Github</a> - <a href="https://huggingface.co/datasets/Nexusflow/NexusRaven_API_evaluation" target="_blank">NexusRaven API evaluation dataset</a>
</p>
<p align="center" width="100%">
<a><img src="NexusRaven.png" alt="NexusRaven" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
</p>
Table of contents
- [NexusRaven-13B: Surpassing the state-of-the-art in open-source function calling LLMs.](#nexusraven-13b-surpassing-the-state-of-the-art-in-open-source-function-calling-llms)
- [Introducing NexusRaven-13B](#introducing-nexusraven-13b)
- [NexusRaven model usage](#nexusraven-model-usage)
- [Training procedure](#training-procedure)
- [Training hyperparameters](#training-hyperparameters)
- [Framework versions](#framework-versions)
- [Limitations](#limitations)
- [License](#license)
- [References](#references)
- [Citation](#citation)
- [Contact](#contact)
This model is a fine-tuned version of [codellama/CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf).
## Introducing NexusRaven-13B
NexusRaven is an open-source and commercially viable function calling LLM that surpasses the state-of-the-art in function calling capabilities.
📊 Performance Highlights: With our demonstration retrieval system, NexusRaven-13B achieves a 95% success rate in using cybersecurity tools such as CVE/CPE Search and VirusTotal, while prompting GPT-4 achieves 64%. It has significantly lower cost and faster inference speed compared to GPT-4.
🔧 Generalization to the Unseen: NexusRaven-13B generalizes to tools never seen during model training, achieving a success rate comparable with GPT-3.5 in zero-shot setting, significantly outperforming all other open-source LLMs of similar sizes.
🔥 Commercially Permissive: The training of NexusRaven-13B does not involve any data generated by proprietary LLMs such as GPT-4. You have full control of the model when deployed in commercial applications.
<p align="center" width="100%">
<a><img src="Single-Attempt_Function_Calling.png" alt="NexusRaven" style="width: 80%; min-width: 300px; display: block; margin: auto;"></a>
<a><img src="Zero-shot_Evaluation.png" alt="NexusRaven" style="width: 80%; min-width: 300px; display: block; margin: auto;"></a>
</p>
## NexusRaven model usage
NexusRaven accepts a list of python functions. These python functions can do anything (including sending GET/POST requests to external APIs!). The two requirements include the python function signature and the appropriate docstring to generate the function call.
NexusRaven is highly compatible with langchain. See [langchain_example.py](https://huggingface.co/Nexusflow/NexusRaven-13B/blob/main/langchain_example.py). An example without langchain can be found in [non_langchain_example.py](https://huggingface.co/Nexusflow/NexusRaven-13B/blob/main/non_langchain_example.py).
Please note that the model will reflect on the answer sometimes, so we highly recommend stopping the model generation at a stopping criteria of `["\nReflection:"]`, to avoid spending unnecessary tokens during inference, but the reflection might help in some rare cases. This is reflected in our langchain example.
More information about how to prompt the model can be found in [prompting_readme.md](prompting_readme.md).
The "Initial Answer" can be executed to run the function.
### Quickstart
You can run the model on a GPU using the following code.
```python
# Please `pip install transformers accelerate`
from transformers import pipeline
pipeline = pipeline(
"text-generation",
model="Nexusflow/NexusRaven-13B",
torch_dtype="auto",
device_map="auto",
)
prompt_template = """
<human>:
OPTION:
<func_start>def hello_world(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello world to the user.
Args:
n (int) : Number of times to print hello world.
\"\"\"
<docstring_end>
OPTION:
<func_start>def hello_universe(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello universe to the user.
Args:
n (int) : Number of times to print hello universe.
\"\"\"
<docstring_end>
User Query: Question: {question}
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
"""
prompt = prompt_template.format(question="Please print hello world 10 times.")
result = pipeline(prompt, max_new_tokens=100, return_full_text=False, do_sample=False)[0]["generated_text"]
# Get the "Initial Call" only
start_str = "Initial Answer: "
end_str = "\nReflection: "
start_idx = result.find(start_str) + len(start_str)
end_idx = result.find(end_str)
function_call = result[start_idx: end_idx]
print (f"Generated Call: {function_call}")
```
This will output:
```text
Generated Call: hello_world(10)
```
Which can be executed.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 2.0
### Framework versions
- Transformers 4.33.2
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3
# Limitations
1. We highly recommend using a stop criteria of `["\nReflection:"]`. The model was trained to first generate an answer and then reflect on its answer to either improve the answer or keep the answer the same. However, this "chain of thought" is often not helpful, and the final answer is seldom better than the initial call. Therefore, we strongly recommend using the Initial Call as the main call to execute.
2. The model works best when it is connected with a retriever when there are a multitude of functions, as a large number of functions will saturate the context window of this model.
3. The model can be prone to generate incorrect calls. Please ensure proper guardrails to capture errant behavior is in place.
## License
This model was trained on commercially viable data and is licensed under the [Llama 2 community license](https://huggingface.co/codellama/CodeLlama-13b-hf/blob/main/LICENSE) following the original [CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf/) model.
## References
We thank the CodeLlama team for their amazing models!
```
@misc{rozière2023code,
title={Code Llama: Open Foundation Models for Code},
author={Baptiste Rozière and Jonas Gehring and Fabian Gloeckle and Sten Sootla and Itai Gat and Xiaoqing Ellen Tan and Yossi Adi and Jingyu Liu and Tal Remez and Jérémy Rapin and Artyom Kozhevnikov and Ivan Evtimov and Joanna Bitton and Manish Bhatt and Cristian Canton Ferrer and Aaron Grattafiori and Wenhan Xiong and Alexandre Défossez and Jade Copet and Faisal Azhar and Hugo Touvron and Louis Martin and Nicolas Usunier and Thomas Scialom and Gabriel Synnaeve},
year={2023},
eprint={2308.12950},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## Citation
```
@misc{nexusraven,
title={NexusRaven: Surpassing the state-of-the-art in open-source function calling LLMs},
author={Nexusflow.ai team},
year={2023},
url={http://nexusflow.ai/blog}
}
```
## Contact
Please reach out to info@nexusflow.ai for any questions!

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

BIN
Zero-shot_Evaluation.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

10
added_tokens.json Normal file
View File

@@ -0,0 +1,10 @@
{
"<bot>:": 32017,
"<bot_end>": 32019,
"<docstring_end>": 32023,
"<docstring_start>": 32022,
"<func_end>": 32021,
"<func_start>": 32020,
"<human>:": 32016,
"<human_end>": 32018
}

27
config.json Normal file
View File

@@ -0,0 +1,27 @@
{
"_name_or_path": "codellama/CodeLlama-13b-Instruct-hf",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"low_cpu_mem_usage": true,
"max_position_embeddings": 16384,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 1000000,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.33.2",
"use_cache": true,
"vocab_size": 32024
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.33.2"
}

148
langchain_example.py Normal file
View File

@@ -0,0 +1,148 @@
from typing import List, Literal, Union
import math
from langchain.tools.base import StructuredTool
from langchain.agents import (
Tool,
AgentExecutor,
LLMSingleActionAgent,
AgentOutputParser,
)
from langchain.schema import AgentAction, AgentFinish, OutputParserException
from langchain.prompts import StringPromptTemplate
from langchain.llms import HuggingFaceTextGenInference
from langchain.chains import LLMChain
##########################################################
# Step 1: Define the functions you want to articulate. ###
##########################################################
def calculator(
input_a: float,
input_b: float,
operation: Literal["add", "subtract", "multiply", "divide"],
):
"""
Computes a calculation.
Args:
input_a (float) : Required. The first input.
input_b (float) : Required. The second input.
operation (string): The operation. Choices include: add to add two numbers, subtract to subtract two numbers, multiply to multiply two numbers, and divide to divide them.
"""
match operation:
case "add":
return input_a + input_b
case "subtract":
return input_a - input_b
case "multiply":
return input_a * input_b
case "divide":
return input_a / input_b
def cylinder_volume(radius, height):
"""
Calculate the volume of a cylinder.
Parameters:
- radius (float): The radius of the base of the cylinder.
- height (float): The height of the cylinder.
Returns:
- float: The volume of the cylinder.
"""
if radius < 0 or height < 0:
raise ValueError("Radius and height must be non-negative.")
volume = math.pi * (radius**2) * height
return volume
#############################################################
# Step 2: Let's define some utils for building the prompt ###
#############################################################
RAVEN_PROMPT = """
{raven_tools}
User Query: Question: {input}
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>"""
# Set up a prompt template
class RavenPromptTemplate(StringPromptTemplate):
# The template to use
template: str
# The list of tools available
tools: List[Tool]
def format(self, **kwargs) -> str:
prompt = "<human>:\n"
for tool in self.tools:
func_signature, func_docstring = tool.description.split(" - ", 1)
prompt += f'\nOPTION:\n<func_start>def {func_signature}<func_end>\n<docstring_start>\n"""\n{func_docstring}\n"""\n<docstring_end>\n'
kwargs["raven_tools"] = prompt
return self.template.format(**kwargs).replace("{{", "{").replace("}}", "}")
class RavenOutputParser(AgentOutputParser):
def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
# Check if agent should finish
if "Initial Answer:" in llm_output:
return AgentFinish(
return_values={
"output": llm_output.strip()
.split("\n")[1]
.replace("Initial Answer: ", "")
.strip()
},
log=llm_output,
)
else:
raise OutputParserException(f"Could not parse LLM output: `{llm_output}`")
##################################################
# Step 3: Build the agent with these utilities ###
##################################################
inference_server_url = "<YOUR ENDPOINT URL>"
assert (
inference_server_url is not "<YOUR ENDPOINT URL>"
), "Please provide your own HF inference endpoint URL!"
llm = HuggingFaceTextGenInference(
inference_server_url=inference_server_url,
temperature=0.001,
max_new_tokens=400,
do_sample=False,
)
tools = [
StructuredTool.from_function(calculator),
StructuredTool.from_function(cylinder_volume),
]
raven_prompt = RavenPromptTemplate(
template=RAVEN_PROMPT, tools=tools, input_variables=["input"]
)
llm_chain = LLMChain(llm=llm, prompt=raven_prompt)
output_parser = RavenOutputParser()
agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
stop=["\nReflection:"],
allowed_tools=tools,
)
agent_chain = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
call = agent_chain.run(
"I have a cake that is about 3 centimenters high and 200 centimeters in radius. How much cake do I have?"
)
print(exec(call))
call = agent_chain.run("What is 1+10?")
print(exec(call))

142
non_langchain_example.py Normal file
View File

@@ -0,0 +1,142 @@
from typing import Literal
import math
import inspect
from transformers import pipeline
##########################################################
# Step 1: Define the functions you want to articulate. ###
##########################################################
def calculator(
input_a: float,
input_b: float,
operation: Literal["add", "subtract", "multiply", "divide"],
):
"""
Computes a calculation.
Args:
input_a (float) : Required. The first input.
input_b (float) : Required. The second input.
operation (string): The operation. Choices include: add to add two numbers, subtract to subtract two numbers, multiply to multiply two numbers, and divide to divide them.
"""
match operation:
case "add":
return input_a + input_b
case "subtract":
return input_a - input_b
case "multiply":
return input_a * input_b
case "divide":
return input_a / input_b
def cylinder_volume(radius, height):
"""
Calculate the volume of a cylinder.
Parameters:
- radius (float): The radius of the base of the cylinder.
- height (float): The height of the cylinder.
Returns:
- float: The volume of the cylinder.
"""
if radius < 0 or height < 0:
raise ValueError("Radius and height must be non-negative.")
volume = math.pi * (radius**2) * height
return volume
#############################################################
# Step 2: Let's define some utils for building the prompt ###
#############################################################
def format_functions_for_prompt(*functions):
formatted_functions = []
for func in functions:
source_code = inspect.getsource(func)
docstring = inspect.getdoc(func)
formatted_functions.append(
f"OPTION:\n<func_start>{source_code}<func_end>\n<docstring_start>\n{docstring}\n<docstring_end>"
)
return "\n".join(formatted_functions)
##############################
# Step 3: Construct Prompt ###
##############################
def construct_prompt(user_query: str):
formatted_prompt = format_functions_for_prompt(calculator, cylinder_volume)
formatted_prompt += f"\n\nUser Query: Question: {user_query}\n"
prompt = (
"<human>:\n"
+ formatted_prompt
+ "Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>"
)
return prompt
#######################################
# Step 4: Execute the function call ###
#######################################
def execute_function_call(model_output):
# Ignore everything after "Reflection" since that is not essential.
function_call = (
model_output[0]["generated_text"]
.strip()
.split("\n")[1]
.replace("Initial Answer:", "")
.strip()
)
try:
return eval(function_call)
except Exception as e:
return str(e)
if __name__ == "__main__":
# Build the model
text_gen = pipeline(
"text-generation",
model="Nexusflow/NexusRaven-13B",
device="cuda",
)
# Comp[ute a Simple equation
prompt = construct_prompt("What is 1+10?")
model_output = text_gen(
prompt, do_sample=False, max_new_tokens=400, return_full_text=False
)
result = execute_function_call(model_output)
print("Model Output:", model_output)
print("Execution Result:", result)
prompt = construct_prompt(
"I have a cake that is about 3 centimenters high and 200 centimeters in diameter. How much cake do I have?"
)
model_output = text_gen(
prompt,
do_sample=False,
max_new_tokens=400,
return_full_text=False,
stop=["\nReflection:"],
)
result = execute_function_call(model_output)
print("Model Output:", model_output)
print("Execution Result:", result)

172
prompting_readme.md Normal file
View File

@@ -0,0 +1,172 @@
## How To Formulate The Prompt
NexusRaven-13B is instruction tuned with a structured prompt for the function calling task. To get the best use out of the model, please ensure to follow the prompt structure.
### Prompt Template
The following is the prompt template you can use to fill in your items.
#### ChatML-Like
Despite this model being single turn only, the model uses ```<human>``` and ```<human_end>``` tags to define human sequences. It also uses ```<bot>``` and ```<bot_end>``` tags to define bot sequences. However, it is highly recommended to stop the bot at "Reflection:", as the bot is trained to reflect on its initial answer during training, which is not required during inference.
#### Defining Functions
Please use the following template to define a single function:
```python
OPTION:
<func_start>def {function_signature}<func_end>
<docstring_start>
"""
{function_docstring}
"""
<docstring_end>
```
For example, to define a hello world function, your prompt will look like this:
```python
OPTION:
<func_start>def hello_world(n : int)<func_end>
<docstring_start>
"""
Prints hello world to the user.
Args:
n (int) : Number of times to print hello world.
"""
<docstring_end>
```
We use the default pythonic way of defining the docstrings. The model should be able to learn and properly use an appropriately formatted python docstring, but it is always highly recommended to describe your function's purpose and the arguments well in the docstring.
#### Defining User Query
The first line in the following template contains the user's actual question. The second line is the instruction to the model. Please see the template below:
```python
User Query: Question: {input}
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
```
So, your query might look like this for the following user question (but please notice that the instruction does not change):
```python
User Query: Question: Please print hello world 10 times.
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
```
#### FewShot Examples
You'll notice that we have "User Query:" and "Question:". This might look redundant for zeroshot, but it serves a strong purpose for fewshot. You can include fewshot examples between the "User Query" and "Question", and the model will leverage those examples. Here is a demonstration of the prompt.
```python
User Query: Here are some examples of similar function calls to generate. Example: How do I announce I'm here to this world someone 3 times? Answer: hello_world(3). Example: How do I tell someone helloworld 2 times? Answer: hello_world(2). Now, please answer this question. Question: Please print hello world 10 times.
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
```
You can include your demonstration examples between ```User Query``` and ```Question```, and ensure you add your final question to the model after ```Question```.
## Putting It All Together
Please start your prompt with ```<human>```.
To prompt the model in zeroshot, please do something this like:
```python
prompt_template = """
<human>:
OPTION:
<func_start>def hello_world(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello world to the user.
Args:
n (int) : Number of times to print hello world.
\"\"\"
<docstring_end>
OPTION:
<func_start>def hello_universe(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello universe to the user.
Args:
n (int) : Number of times to print hello universe.
\"\"\"
<docstring_end>
User Query: Question: {question}
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>
"""
prompt = prompt_template.format(question="Please print hello world 10 times.")
```
You're welcome to add an arbitrary number of functions in the same format. Using this driver code:
```python
# Please `pip install transformers accelerate`
from transformers import pipeline
pipeline = pipeline(
"text-generation",
model="Nexusflow/NexusRaven-13B",
torch_dtype="auto",
device_map="auto",
)
result = pipeline(prompt, max_new_tokens=100, return_full_text=False, do_sample=False)[0]["generated_text"]
# Get the "Initial Call" only
start_str = "Initial Answer: "
end_str = "\nReflection: "
start_idx = result.find(start_str) + len(start_str)
end_idx = result.find(end_str)
function_call = result[start_idx: end_idx]
print (f"Generated Call: {function_call}")
```
The call will print out:
```text
Generated Call: hello_world(10)
```
To prompt the model in fewshot, please do something this like:
```python
PROMPT = \
"""
<human>:
OPTION:
<func_start>def hello_world(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello world to the user.
Args:
n (int) : Number of times to print hello world.
\"\"\"
<docstring_end>
OPTION:
<func_start>def hello_universe(n : int)<func_end>
<docstring_start>
\"\"\"
Prints hello universe to the user.
Args:
n (int) : Number of times to print hello universe.
\"\"\"
<docstring_end>
User Query: Example: How do I announce I'm here to this world someone 3 times? Answer: hello_world(3). Example: How do I tell someone hello universe 2 times? Answer: hello_universe(2). Now, please answer this question. Question: Please print hello universe 14 times.
Please pick a function from the above options that best answers the user query and fill in the appropriate arguments.<human_end>"""
```
This code will print:
```text
Generated Call: hello_universe(14)
```

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:702d3695b1740ce1b2f68633ed8c87b035fd73045a49982b130ae9b1c54e03db
size 9948965193

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a3db7d7b6e41ea3ed6029199e9ba57e18e8472cab177aceac298427385a369e9
size 9904155408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ca6418e96d6476e0de360b501c0877053df126beea05f3dd6cc89e274ab5db9
size 6179223967

View File

@@ -0,0 +1,370 @@
{
"metadata": {
"total_size": 26032220160
},
"weight_map": {
"lm_head.weight": "pytorch_model-00003-of-00003.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
"model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
"model.norm.weight": "pytorch_model-00003-of-00003.bin"
}
}

34
special_tokens_map.json Normal file
View File

@@ -0,0 +1,34 @@
{
"additional_special_tokens": [
"<human>:",
"<bot>:",
"<human_end>",
"<bot_end>",
"<func_start>",
"<func_end>",
"<docstring_start>",
"<docstring_end>"
],
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"pad_token": "</s>",
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

93526
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
size 500058

45
tokenizer_config.json Normal file
View File

@@ -0,0 +1,45 @@
{
"additional_special_tokens": [
"▁<PRE>",
"▁<MID>",
"▁<SUF>",
"▁<EOT>"
],
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"eot_token": "▁<EOT>",
"fill_token": "<FILL_ME>",
"legacy": null,
"middle_token": "▁<MID>",
"model_max_length": 8192,
"pad_token": null,
"prefix_token": "▁<PRE>",
"sp_model_kwargs": {},
"suffix_token": "▁<SUF>",
"tokenizer_class": "CodeLlamaTokenizer",
"truncation_side": "left",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"use_default_system_prompt": false
}

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e2229f51a3700a9037c6a1468c229ae7a33a8b1dff83d12728e20c730a73b78
size 6075