初始化项目,由ModelHub XC社区提供模型

Model: strykes/emberforge-3b-reasoner
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-30 19:09:18 +08:00
commit 7c36fbd792
28 changed files with 5552 additions and 0 deletions

39
.gitattributes vendored Normal file
View File

@@ -0,0 +1,39 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text
gguf/Nanbeige4.1-3B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
gguf/Nanbeige4.1-3B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
gguf/Nanbeige4.1-3B-f16.gguf filter=lfs diff=lfs merge=lfs -text

76
README.md Normal file
View File

@@ -0,0 +1,76 @@
---
language:
- en
license: apache-2.0
tags:
- transformers
- safetensors
- gguf
- peft
- qlora
- reasoning
base_model:
- Nanbeige/Nanbeige4.1-3B
library_name: transformers
pipeline_tag: text-generation
---
# EmberForge-3B-Reasoner
Private finetuned Nanbeige4.1-3B reasoning release by `strykes`.
## Included Artifacts
- Merged full model (Safetensors) at repo root for HF benchmarking
- LoRA adapter in `adapter/`
- GGUF in `gguf/`:
- `Nanbeige4.1-3B-Q5_K_M.gguf`
- `Nanbeige4.1-3B-Q4_K_M.gguf`
- `Nanbeige4.1-3B-f16.gguf`
- Optional archive in `archives/`
## Training Snapshot
- Base: `Nanbeige/Nanbeige4.1-3B`
- Method: Unsloth QLoRA -> merged weights
- Data: ~3.5k synthetic reasoning samples
- Epochs: 2
- Sequence length: 4096
## Notes
- Intended for research and benchmarking.
- Validate outputs before critical use.
## Benchmarks (2026-02-24)
### Local lm-eval results (this finetune)
| Task | Metric | Score |
|---|---:|---:|
| mmlu | acc,none | 59.98% |
| gsm8k | exact_match,flexible-extract | 62.40% |
| arc_challenge | acc_norm,none | 31.74% |
| hellaswag | acc_norm,none | 56.07% |
| winogrande | acc,none | 50.04% |
| piqa | acc_norm,none | 63.22% |
| boolq | acc,none | 74.37% |
| truthfulqa_mc2 | acc,none | 45.34% |
### Public references
- Base model (`Nanbeige/Nanbeige4.1-3B`) author-published benchmarks are listed in:
- `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
- Frontier references (Claude/GPT/Gemini) are included in the same comparison report.
### Reproducibility artifacts
- `benchmarks/lm-eval-2026-02-24/summary_v3.tsv`
- `benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json`
- `benchmarks/lm-eval-2026-02-24/run_v3.log`
- `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
### Caveat
Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).

210
adapter/README.md Normal file
View File

@@ -0,0 +1,210 @@
---
base_model: Nanbeige/Nanbeige4.1-3B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Nanbeige/Nanbeige4.1-3B
- lora
- sft
- transformers
- trl
- unsloth
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
#### Speeds, Sizes, Times [optional]
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Factors
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
[More Information Needed]
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
#### Summary
## Model Examination [optional]
<!-- Relevant interpretability work for the model goes here -->
[More Information Needed]
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
#### Hardware
[More Information Needed]
#### Software
[More Information Needed]
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]
### Framework versions
- PEFT 0.18.1

View File

@@ -0,0 +1,50 @@
{
"alora_invocation_tokens": null,
"alpha_pattern": {},
"arrow_config": null,
"auto_mapping": {
"base_model_class": "LlamaForCausalLM",
"parent_library": "transformers.models.llama.modeling_llama",
"unsloth_fixed": true
},
"base_model_name_or_path": "Nanbeige/Nanbeige4.1-3B",
"bias": "none",
"corda_config": null,
"ensure_weight_tying": false,
"eva_config": null,
"exclude_modules": null,
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 128,
"lora_bias": false,
"lora_dropout": 0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"peft_version": "0.18.1",
"qalora_group_size": 16,
"r": 64,
"rank_pattern": {},
"revision": null,
"target_modules": [
"down_proj",
"up_proj",
"gate_proj",
"o_proj",
"k_proj",
"v_proj",
"q_proj"
],
"target_parameters": null,
"task_type": "CAUSAL_LM",
"trainable_token_indices": null,
"use_dora": false,
"use_qalora": false,
"use_rslora": false
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7983f9ec6827018eeffa27618229f4c6a1326ee107c8fbe2c268301afcb47e22
size 455142376

View File

@@ -0,0 +1,9 @@
{
"</think>": 166104,
"</tool_call>": 166106,
"<think>": 166103,
"<tool_call>": 166105,
"<|endoftext|>": 166102,
"<|im_end|>": 166101,
"<|im_start|>": 166100
}

137
adapter/chat_template.jinja Normal file
View File

@@ -0,0 +1,137 @@
{%- if tools %}
{{- '<|im_start|>system
' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '
' }}
{%- else %}
{{- '你是一位工具函数调用专家,你会得到一个问题和一组可能的工具函数。根据问题,你需要进行一个或多个函数/工具调用以实现目的,请尽量尝试探索通过工具解决问题。
如果没有一个函数可以使用,请直接使用自然语言回复用户。
如果给定的问题缺少函数所需的参数,请使用自然语言进行提问,向用户询问必要信息。
如果调用结果已经足够回答用户问题,请对历史结果进行总结,使用自然语言回复用户。' }}
{%- endif %}
{{- "# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>" }}
{%- for tool in tools %}
{{- "
" }}
{{- tool | tojson }}
{%- endfor %}
{{- "
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{\"name\": <function-name>, \"arguments\": <args-json-object>}
</tool_call><|im_end|>
" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system
' + messages[0].content + '<|im_end|>
' }}
{%- else %}
{{- '<|im_start|>system
你是南北阁一款由BOSS直聘自主研发并训练的专业大语言模型。<|im_end|>
' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '
' + content + '<|im_end|>' + '
' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('
').split('<think>')[-1].lstrip('
') %}
{%- set content = content.split('</think>')[-1].lstrip('
') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index or keep_all_think or (extra_body is defined and extra_body.keep_all_think) %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '
<think>
' + reasoning_content.strip('
') + '
</think>
' + content.lstrip('
') }}
{%- else %}
{{- '<|im_start|>' + message.role + '
' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '
' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '
' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>
{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}
</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>
' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '
<tool_response>
' }}
{{- content }}
{{- '
</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>
' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant
' }}
{%- endif %}

View File

@@ -0,0 +1,33 @@
{
"additional_special_tokens": [
"<|endoftext|>"
],
"bos_token": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

3
adapter/tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb41d04798b714520a9b075727b0226538b7330254299062742c50ec8374bc36
size 2782298

View File

@@ -0,0 +1,103 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": true,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"166100": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166101": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166102": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166103": {
"content": "<think>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166104": {
"content": "</think>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166105": {
"content": "<tool_call>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166106": {
"content": "</tool_call>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|endoftext|>"
],
"bos_token": "<|im_start|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"extra_special_tokens": {},
"legacy": false,
"model_max_length": 262144,
"pad_token": "<unk>",
"padding_side": "left",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}

9
added_tokens.json Normal file
View File

@@ -0,0 +1,9 @@
{
"</think>": 166104,
"</tool_call>": 166106,
"<think>": 166103,
"<tool_call>": 166105,
"<|endoftext|>": 166102,
"<|im_end|>": 166101,
"<|im_start|>": 166100
}

View File

@@ -0,0 +1,70 @@
# Emberforge 3B Benchmark Comparison (Public + Local)
Generated: 2026-02-24
## 1) Your Finetuned Model (local lm-eval run)
Model: `strykes/emberforge-3b-reasoner`
| Task | Metric | Score |
|---|---:|---:|
| mmlu | acc,none | 59.98% |
| gsm8k | exact_match,flexible-extract | 62.40% |
| arc_challenge | acc_norm,none | 31.74% |
| hellaswag | acc_norm,none | 56.07% |
| winogrande | acc,none | 50.04% |
| piqa | acc_norm,none | 63.22% |
| boolq | acc,none | 74.37% |
| truthfulqa_mc2 | acc,none | 45.34% |
## 2) Public Base Model (Nanbeige4.1-3B)
Model: `Nanbeige/Nanbeige4.1-3B` (author-reported benchmarks)
| Benchmark | Published Score |
|---|---:|
| Live-Code-Bench-V6 | 76.90% |
| AIME 2026 I | 87.40% |
| HMMT Nov | 77.92% |
| GPQA | 83.80% |
| HLE (Text-only) | 12.60% |
| Arena-Hard-v2 | 73.20% |
| BFCL-V4 | 56.50% |
| Tau2-Bench | 48.57% |
Note: Nanbeige published benchmarks do not overlap directly with your lm-eval task set (`mmlu`, `gsm8k`, `arc_challenge`, etc.), so no exact apples-to-apples delta can be computed without rerunning identical tasks.
## 3) Public Frontier Reference (Claude / GPT / Gemini) on overlapping classic tasks
Source benchmark table: Anthropic Claude 3 model card (March 2024).
| Benchmark | Your model | Claude 3 Opus | Claude 3 Sonnet | GPT-4 | Gemini 1.0 Ultra | Gemini 1.5 Pro |
|---|---:|---:|---:|---:|---:|---:|
| MMLU (5-shot) | 59.98% | 86.80% | 79.00% | 86.40% | 83.70% | 81.90% |
| GSM8K | 62.40% | 95.00% | 92.30% | 92.00% | 94.40% | 91.70% |
| ARC-Challenge (25-shot) | 31.74% | 96.40% | 93.20% | 96.30% | — | — |
| HellaSwag (10-shot) | 56.07% | 95.40% | 89.00% | 95.30% | 87.80% | 92.50% |
| WinoGrande (5-shot) | 50.04% | 88.50% | 75.10% | 87.50% | — | — |
## 4) Latest Frontier Snapshot (2025-2026, non-overlapping tasks)
Source benchmark table: Claude Opus 4.5 system card, Table 2.3.A.
| Benchmark | Claude Opus 4.5 | Claude Sonnet 4.5 | Claude Opus 4.1 | Gemini 3 Pro | GPT-5.1 |
|---|---:|---:|---:|---:|---:|
| SWE-bench Verified | 80.9% | 77.2% | 74.5% | 76.2% | 76.3% |
| Terminal-bench 2.0 | 59.3% | 50.0% | 46.5% | 54.2% | 47.6% |
| ARC-AGI-2 (Verified) | 37.6% | 13.6% | — | 31.1% | 17.6% |
| GPQA Diamond | 87.0% | 83.4% | 81.0% | 91.9% | 88.1% |
| MMMU (validation) | 80.7% | 77.8% | 77.1% | — | 85.4% |
| MMMLU | 90.8% | 89.1% | 89.5% | 91.8% | 91.0% |
Note: These are newer references but still not directly comparable to your current lm-eval task set.
## 5) Caveats
- Your run uses `lm-evaluation-harness` with specific settings; public model-card numbers may use different prompts, few-shot counts, decoding, or evaluation code.
- Frontier references in Section 3 are older than current 2026 generations but are official primary-source numbers on overlapping classic benchmarks.
- Frontier references in Section 4 are current (2025-2026) but mostly on different benchmarks.
## Sources
- Local run artifact: `/workspace/evals/main_results_v3.json/strykes__emberforge-3b-reasoner/results_2026-02-24T00-06-21.474293.json`
- Nanbeige model card: https://huggingface.co/Nanbeige/Nanbeige4.1-3B
- Anthropic Claude 3 model card (benchmarks table): https://www-cdn.anthropic.com/c6a80a657af445f40e31afac050f3bf76d3b1404.pdf
- Anthropic model cards index: https://www.anthropic.com/system-cards
- Anthropic Claude Opus 4.5 system card: https://www-cdn.anthropic.com/bf10f64990cfda0ba858290be7b8cc6317685f47.pdf

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,70 @@
task metric value
arc_challenge acc_norm,none 0.3174061433447099
boolq acc,none 0.7437308868501529
gsm8k exact_match,flexible-extract 0.6239575435936315
hellaswag acc_norm,none 0.560744871539534
mmlu acc,none 0.5997721122347244
mmlu_abstract_algebra acc,none 0.43
mmlu_anatomy acc,none 0.6074074074074074
mmlu_astronomy acc,none 0.6973684210526315
mmlu_business_ethics acc,none 0.62
mmlu_clinical_knowledge acc,none 0.6415094339622641
mmlu_college_biology acc,none 0.8263888888888888
mmlu_college_chemistry acc,none 0.53
mmlu_college_computer_science acc,none 0.54
mmlu_college_mathematics acc,none 0.5
mmlu_college_medicine acc,none 0.5953757225433526
mmlu_college_physics acc,none 0.5
mmlu_computer_security acc,none 0.68
mmlu_conceptual_physics acc,none 0.5872340425531914
mmlu_econometrics acc,none 0.35964912280701755
mmlu_electrical_engineering acc,none 0.6413793103448275
mmlu_elementary_mathematics acc,none 0.5317460317460317
mmlu_formal_logic acc,none 0.5
mmlu_global_facts acc,none 0.33
mmlu_high_school_biology acc,none 0.7548387096774194
mmlu_high_school_chemistry acc,none 0.6009852216748769
mmlu_high_school_computer_science acc,none 0.69
mmlu_high_school_european_history acc,none 0.7696969696969697
mmlu_high_school_geography acc,none 0.7272727272727273
mmlu_high_school_government_and_politics acc,none 0.7461139896373057
mmlu_high_school_macroeconomics acc,none 0.6435897435897436
mmlu_high_school_mathematics acc,none 0.45555555555555555
mmlu_high_school_microeconomics acc,none 0.7773109243697479
mmlu_high_school_physics acc,none 0.5165562913907285
mmlu_high_school_psychology acc,none 0.8
mmlu_high_school_statistics acc,none 0.5694444444444444
mmlu_high_school_us_history acc,none 0.7156862745098039
mmlu_high_school_world_history acc,none 0.7974683544303798
mmlu_human_aging acc,none 0.600896860986547
mmlu_human_sexuality acc,none 0.6946564885496184
mmlu_humanities acc,none 0.5300743889479277
mmlu_international_law acc,none 0.7851239669421488
mmlu_jurisprudence acc,none 0.7222222222222222
mmlu_logical_fallacies acc,none 0.6932515337423313
mmlu_machine_learning acc,none 0.42857142857142855
mmlu_management acc,none 0.6893203883495146
mmlu_marketing acc,none 0.8034188034188035
mmlu_medical_genetics acc,none 0.69
mmlu_miscellaneous acc,none 0.6717752234993615
mmlu_moral_disputes acc,none 0.5953757225433526
mmlu_moral_scenarios acc,none 0.2446927374301676
mmlu_nutrition acc,none 0.6764705882352942
mmlu_other acc,none 0.6269713550048278
mmlu_philosophy acc,none 0.6559485530546624
mmlu_prehistory acc,none 0.6265432098765432
mmlu_professional_accounting acc,none 0.4397163120567376
mmlu_professional_law acc,none 0.4745762711864407
mmlu_professional_medicine acc,none 0.6838235294117647
mmlu_professional_psychology acc,none 0.5915032679738562
mmlu_public_relations acc,none 0.6
mmlu_security_studies acc,none 0.7020408163265306
mmlu_social_sciences acc,none 0.6906077348066298
mmlu_sociology acc,none 0.7711442786069652
mmlu_stem acc,none 0.5883285759594037
mmlu_us_foreign_policy acc,none 0.78
mmlu_virology acc,none 0.45180722891566266
mmlu_world_religions acc,none 0.7192982456140351
piqa acc_norm,none 0.6322089227421109
truthfulqa_mc2 acc,none 0.45340473177307805
winogrande acc,none 0.500394632991318
1 task metric value
2 arc_challenge acc_norm,none 0.3174061433447099
3 boolq acc,none 0.7437308868501529
4 gsm8k exact_match,flexible-extract 0.6239575435936315
5 hellaswag acc_norm,none 0.560744871539534
6 mmlu acc,none 0.5997721122347244
7 mmlu_abstract_algebra acc,none 0.43
8 mmlu_anatomy acc,none 0.6074074074074074
9 mmlu_astronomy acc,none 0.6973684210526315
10 mmlu_business_ethics acc,none 0.62
11 mmlu_clinical_knowledge acc,none 0.6415094339622641
12 mmlu_college_biology acc,none 0.8263888888888888
13 mmlu_college_chemistry acc,none 0.53
14 mmlu_college_computer_science acc,none 0.54
15 mmlu_college_mathematics acc,none 0.5
16 mmlu_college_medicine acc,none 0.5953757225433526
17 mmlu_college_physics acc,none 0.5
18 mmlu_computer_security acc,none 0.68
19 mmlu_conceptual_physics acc,none 0.5872340425531914
20 mmlu_econometrics acc,none 0.35964912280701755
21 mmlu_electrical_engineering acc,none 0.6413793103448275
22 mmlu_elementary_mathematics acc,none 0.5317460317460317
23 mmlu_formal_logic acc,none 0.5
24 mmlu_global_facts acc,none 0.33
25 mmlu_high_school_biology acc,none 0.7548387096774194
26 mmlu_high_school_chemistry acc,none 0.6009852216748769
27 mmlu_high_school_computer_science acc,none 0.69
28 mmlu_high_school_european_history acc,none 0.7696969696969697
29 mmlu_high_school_geography acc,none 0.7272727272727273
30 mmlu_high_school_government_and_politics acc,none 0.7461139896373057
31 mmlu_high_school_macroeconomics acc,none 0.6435897435897436
32 mmlu_high_school_mathematics acc,none 0.45555555555555555
33 mmlu_high_school_microeconomics acc,none 0.7773109243697479
34 mmlu_high_school_physics acc,none 0.5165562913907285
35 mmlu_high_school_psychology acc,none 0.8
36 mmlu_high_school_statistics acc,none 0.5694444444444444
37 mmlu_high_school_us_history acc,none 0.7156862745098039
38 mmlu_high_school_world_history acc,none 0.7974683544303798
39 mmlu_human_aging acc,none 0.600896860986547
40 mmlu_human_sexuality acc,none 0.6946564885496184
41 mmlu_humanities acc,none 0.5300743889479277
42 mmlu_international_law acc,none 0.7851239669421488
43 mmlu_jurisprudence acc,none 0.7222222222222222
44 mmlu_logical_fallacies acc,none 0.6932515337423313
45 mmlu_machine_learning acc,none 0.42857142857142855
46 mmlu_management acc,none 0.6893203883495146
47 mmlu_marketing acc,none 0.8034188034188035
48 mmlu_medical_genetics acc,none 0.69
49 mmlu_miscellaneous acc,none 0.6717752234993615
50 mmlu_moral_disputes acc,none 0.5953757225433526
51 mmlu_moral_scenarios acc,none 0.2446927374301676
52 mmlu_nutrition acc,none 0.6764705882352942
53 mmlu_other acc,none 0.6269713550048278
54 mmlu_philosophy acc,none 0.6559485530546624
55 mmlu_prehistory acc,none 0.6265432098765432
56 mmlu_professional_accounting acc,none 0.4397163120567376
57 mmlu_professional_law acc,none 0.4745762711864407
58 mmlu_professional_medicine acc,none 0.6838235294117647
59 mmlu_professional_psychology acc,none 0.5915032679738562
60 mmlu_public_relations acc,none 0.6
61 mmlu_security_studies acc,none 0.7020408163265306
62 mmlu_social_sciences acc,none 0.6906077348066298
63 mmlu_sociology acc,none 0.7711442786069652
64 mmlu_stem acc,none 0.5883285759594037
65 mmlu_us_foreign_policy acc,none 0.78
66 mmlu_virology acc,none 0.45180722891566266
67 mmlu_world_religions acc,none 0.7192982456140351
68 piqa acc_norm,none 0.6322089227421109
69 truthfulqa_mc2 acc,none 0.45340473177307805
70 winogrande acc,none 0.500394632991318

137
chat_template.jinja Normal file
View File

@@ -0,0 +1,137 @@
{%- if tools %}
{{- '<|im_start|>system
' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '
' }}
{%- else %}
{{- '你是一位工具函数调用专家,你会得到一个问题和一组可能的工具函数。根据问题,你需要进行一个或多个函数/工具调用以实现目的,请尽量尝试探索通过工具解决问题。
如果没有一个函数可以使用,请直接使用自然语言回复用户。
如果给定的问题缺少函数所需的参数,请使用自然语言进行提问,向用户询问必要信息。
如果调用结果已经足够回答用户问题,请对历史结果进行总结,使用自然语言回复用户。' }}
{%- endif %}
{{- "# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>" }}
{%- for tool in tools %}
{{- "
" }}
{{- tool | tojson }}
{%- endfor %}
{{- "
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{\"name\": <function-name>, \"arguments\": <args-json-object>}
</tool_call><|im_end|>
" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system
' + messages[0].content + '<|im_end|>
' }}
{%- else %}
{{- '<|im_start|>system
你是南北阁一款由BOSS直聘自主研发并训练的专业大语言模型。<|im_end|>
' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '
' + content + '<|im_end|>' + '
' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('
').split('<think>')[-1].lstrip('
') %}
{%- set content = content.split('</think>')[-1].lstrip('
') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index or keep_all_think or (extra_body is defined and extra_body.keep_all_think) %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '
<think>
' + reasoning_content.strip('
') + '
</think>
' + content.lstrip('
') }}
{%- else %}
{{- '<|im_start|>' + message.role + '
' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '
' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '
' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>
{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}
</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>
' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '
<tool_response>
' }}
{{- content }}
{{- '
</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>
' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant
' }}
{%- endif %}

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 166100,
"dtype": "float16",
"embd_pdrop": 0.0,
"eos_token_id": 166101,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": 10496,
"max_position_embeddings": 262144,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 20,
"num_hidden_layers": 32,
"num_key_value_heads": 4,
"pad_token_id": 0,
"pretraining_tp": 1,
"resid_pdrop": 0.0,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 70000000,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 166144
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 166100,
"eos_token_id": 166101,
"pad_token_id": 0,
"transformers_version": "4.57.6"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a5a2f9028a7ff9959b5cc08fc01228ff67b9c7d0ddaa41c086acd3c43e4210b
size 2443112064

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:171f542b60aac86574aec155af15d036e4ca4d8c44f74d42eab770d17af19339
size 2825268864

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:113fea20515ed173bda89873e8dc81a24839872c5ad4d06cbbb477afabe24006
size 7871576704

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7ac64308cdbf331f061103bf29939acb3d8718f238f75903706de5ddae9fd16b
size 4982284224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:25ad3c5f1e8f149f0cf17555f2850072f0bbef27e4554f7cf4d26fc7931f3673
size 2885023544

View File

@@ -0,0 +1,299 @@
{
"metadata": {
"total_parameters": 3933637120,
"total_size": 7867274240
},
"weight_map": {
"lm_head.weight": "model-00002-of-00002.safetensors",
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
"model.norm.weight": "model-00002-of-00002.safetensors"
}
}

33
special_tokens_map.json Normal file
View File

@@ -0,0 +1,33 @@
{
"additional_special_tokens": [
"<|endoftext|>"
],
"bos_token": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1d8f0326910136aca20831249220b38ce5299527647bc8c6b65404485c479740
size 18451122

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb41d04798b714520a9b075727b0226538b7330254299062742c50ec8374bc36
size 2782298

102
tokenizer_config.json Normal file
View File

@@ -0,0 +1,102 @@
{
"add_bos_token": true,
"add_eos_token": false,
"add_prefix_space": true,
"added_tokens_decoder": {
"0": {
"content": "<unk>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": true
},
"166100": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166101": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166102": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"166103": {
"content": "<think>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166104": {
"content": "</think>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166105": {
"content": "<tool_call>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
},
"166106": {
"content": "</tool_call>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|endoftext|>"
],
"bos_token": "<|im_start|>",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"extra_special_tokens": {},
"legacy": true,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<unk>",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "LlamaTokenizer",
"unk_token": "<unk>",
"use_default_system_prompt": false
}