初始化项目,由ModelHub XC社区提供模型
Model: reaperdoesntknow/Symiotic-14B Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
147
README.md
Normal file
147
README.md
Normal file
@@ -0,0 +1,147 @@
|
||||
---
|
||||
license: afl-3.0
|
||||
datasets:
|
||||
- 0xZee/dataset-CoT-Advanced-Calculus-268
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- Qwen/Qwen3-14B
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
tags:
|
||||
- qwen3
|
||||
- symbiotic
|
||||
- symbioticai
|
||||
- llm
|
||||
- Symbols
|
||||
- convergentintel
|
||||
---
|
||||
|
||||
# SymbioticLM-14B
|
||||
|
||||
|
||||
**Model Type**: Hybrid Symbolic–Transformer with Persistent Memory
|
||||
**Base Model**: Qwen-14B
|
||||
**Framework**: PyTorch + HuggingFace Transformers
|
||||
**Purpose**: Full-scale cognitive reasoning model with self-organizing memory and generative symbolic evolution
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
SymbioticLM-14B is a state-of-the-art 17.8 billion parameter symbolic–transformer hybrid model that tightly couples high-capacity neural representation with structured symbolic cognition. Designed to match or exceed performance of top-tier LLMs in symbolic domains, it supports persistent memory, entropic recall, multi-stage symbolic routing, and self-organizing knowledge structures.
|
||||
|
||||
This model is ideal for advanced reasoning agents, research assistants, and symbolic math/code generation systems.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
- **Backbone**: Qwen-14B transformer with rotary embeddings + FlashAttention
|
||||
- **Symbolic Dim**: 8192
|
||||
- **Symbolic Modules**:
|
||||
- ThoughtDynamicsLNN (multi-head LSTM attention)
|
||||
- LiquidThoughtProcessor
|
||||
- CrystallineProcessor (DNAConv GNN)
|
||||
- HelicalDNAProcessor (linear helical encoding)
|
||||
- **Memory**: 4096 symbolic states in FP32, retrieved using entropy + contextual similarity
|
||||
- **Dream Mode**: Background symbolic simulation for open-ended cognition
|
||||
- **Router**: Intent classifier + entropy gating for processor path selection
|
||||
|
||||
---
|
||||
|
||||
## Files Included
|
||||
|
||||
| File | Description |
|
||||
|--------------------------|----------------------------------------------------------|
|
||||
| `model.bin` | Transformer weights (LFS) |
|
||||
| `model.safetensors` | Memory-safe weights, optimized for loading |
|
||||
| `memory.pt` | 4096-symbolic vector bank |
|
||||
| `config.json` | Model and architectural metadata |
|
||||
| `generation_config.json` | Top-p, temperature, decoding settings |
|
||||
| `tokenizer.json` | Full tokenizer with symbolic tag support |
|
||||
| `added_tokens.json` | Tags like `<D_LIM>`, `<PROOF>`, `<BY_MEASURE>`, etc. |
|
||||
| `special_tokens_map.json`| Special token mapping for tokenizer |
|
||||
|
||||
---
|
||||
|
||||
## Intended Uses
|
||||
|
||||
- Multi-step conversational agents with true memory
|
||||
- Long-form symbolic theorem generation and proof planning
|
||||
- Scientific dialogue, symbolic simulations, math/code synthesis
|
||||
- Reasoning in fuzzy, discontinuous, or non-smooth problem domains
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
- Memory requires curation and seeding for maximum utility
|
||||
- Symbolic cognition is not instruction-tuned for general QA
|
||||
- FlashAttention and symbolic modules increase VRAM usage during generation
|
||||
|
||||
---
|
||||
|
||||
## Citations
|
||||
|
||||
|
||||
Please cite "SymbioticLM" when using symbolic memory components in research or applications.
|
||||
|
||||
---
|
||||
|
||||
## Convergent Intelligence Portfolio
|
||||
|
||||
*Part of the [Symbiotic AI Series](https://huggingface.co/reaperdoesntknow) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
|
||||
|
||||
|
||||
#
|
||||
## Mathematical Foundations: Discrepancy Calculus (DISC)
|
||||
|
||||
SymbioticLM's persistent memory and symbolic evolution connect to Discrepancy Calculus through **self-generating completeness** (Ch. 3 of the DISC monograph) and **symbolic-root domains**. The discrepancy operator:
|
||||
|
||||
$$Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|}\, dt$$
|
||||
|
||||
quantifies local mismatch between integration and differentiation. In the symbolic-transformer context, $D$ measures the gap between what the symbolic system encodes (discrete structure) and what the transformer integrates (continuous representation). The self-generating completeness theorem establishes that completeness emerges dynamically via energy computation on symbolic-root domains — the mathematical foundation for why symbolic-neural hybrids can produce structure that neither component generates alone.
|
||||
|
||||
The **discrepancy energy** $E_{\text{disc}}[f] = \frac{1}{2}\int w(x)(Df(x))^2 d\mu(x)$ provides a natural stability criterion for the memory consolidation process: memory states with bounded discrepancy energy are stable; those with divergent energy indicate structural transitions requiring reorganization.
|
||||
|
||||
Full theory: *"On the Formal Analysis of Discrepancy Calculus"* (Colca, 2026; Convergent Intelligence LLC: Research Division).
|
||||
|
||||
## Related Models
|
||||
|
||||
| Model | Downloads | Format |
|
||||
|-------|-----------|--------|
|
||||
| [Symbiotic-1B](https://huggingface.co/reaperdoesntknow/Symbiotic-1B) | 4 | HF |
|
||||
| [Symbiotic-8B](https://huggingface.co/reaperdoesntknow/Symbiotic-8B) | 4 | HF |
|
||||
| [Symbiotic-Beta](https://huggingface.co/reaperdoesntknow/Symbiotic-Beta) | 3 | HF |
|
||||
|
||||
### Top Models from Our Lab
|
||||
|
||||
| Model | Downloads |
|
||||
|-------|-----------|
|
||||
| [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | 501 |
|
||||
| [LFM2.5-1.2B-Distilled-SFT](https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT) | 342 |
|
||||
| [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | 302 |
|
||||
| [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF) | 203 |
|
||||
| [Qwen3-1.7B-Coder-Distilled-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF) | 194 |
|
||||
|
||||
**Total Portfolio: 49 models, 22,598 total downloads**
|
||||
|
||||
|
||||
*Last updated: 2026-03-28 12:57 UTC*
|
||||
|
||||
<!-- CIX-CROSSLINK-START -->
|
||||
|
||||
---
|
||||
|
||||
## From the Convergent Intelligence Portfolio
|
||||
|
||||
**[DistilQwen Collection](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
|
||||
|
||||
Top model: [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) — 508 downloads
|
||||
|
||||
Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165)
|
||||
|
||||
*Convergent Intelligence LLC: Research Division*
|
||||
|
||||
<!-- CIX-CROSSLINK-END -->
|
||||
28
added_tokens.json
Normal file
28
added_tokens.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"</think>": 151668,
|
||||
"</tool_call>": 151658,
|
||||
"</tool_response>": 151666,
|
||||
"<think>": 151667,
|
||||
"<tool_call>": 151657,
|
||||
"<tool_response>": 151665,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
31
config.json
Normal file
31
config.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"_attn_implementation_autoset": true,
|
||||
"architectures": [
|
||||
"Qwen3ForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": 151645,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 5120,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 17408,
|
||||
"max_position_embeddings": 40960,
|
||||
"max_window_layers": 40,
|
||||
"model_type": "qwen3",
|
||||
"num_attention_heads": 40,
|
||||
"num_hidden_layers": 40,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-06,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 1000000,
|
||||
"sliding_window": null,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.51.3",
|
||||
"use_cache": true,
|
||||
"use_sliding_window": false,
|
||||
"vocab_size": 151936
|
||||
}
|
||||
74
generation_config.json
Normal file
74
generation_config.json
Normal file
@@ -0,0 +1,74 @@
|
||||
{
|
||||
"max_length": 20,
|
||||
"max_new_tokens": null,
|
||||
"min_length": 0,
|
||||
"min_new_tokens": null,
|
||||
"early_stopping": false,
|
||||
"max_time": null,
|
||||
"stop_strings": null,
|
||||
"do_sample": true,
|
||||
"num_beams": 1,
|
||||
"num_beam_groups": 1,
|
||||
"penalty_alpha": null,
|
||||
"dola_layers": null,
|
||||
"use_cache": true,
|
||||
"cache_implementation": null,
|
||||
"cache_config": null,
|
||||
"return_legacy_cache": null,
|
||||
"prefill_chunk_size": null,
|
||||
"temperature": 0.6,
|
||||
"top_k": 20,
|
||||
"top_p": 0.95,
|
||||
"min_p": null,
|
||||
"typical_p": 1.0,
|
||||
"epsilon_cutoff": 0.0,
|
||||
"eta_cutoff": 0.0,
|
||||
"diversity_penalty": 0.0,
|
||||
"repetition_penalty": 1.0,
|
||||
"encoder_repetition_penalty": 1.0,
|
||||
"length_penalty": 1.0,
|
||||
"no_repeat_ngram_size": 0,
|
||||
"bad_words_ids": null,
|
||||
"force_words_ids": null,
|
||||
"renormalize_logits": false,
|
||||
"constraints": null,
|
||||
"forced_bos_token_id": null,
|
||||
"forced_eos_token_id": null,
|
||||
"remove_invalid_values": false,
|
||||
"exponential_decay_length_penalty": null,
|
||||
"suppress_tokens": null,
|
||||
"begin_suppress_tokens": null,
|
||||
"forced_decoder_ids": null,
|
||||
"sequence_bias": null,
|
||||
"token_healing": false,
|
||||
"guidance_scale": null,
|
||||
"low_memory": null,
|
||||
"watermarking_config": null,
|
||||
"num_return_sequences": 1,
|
||||
"output_attentions": false,
|
||||
"output_hidden_states": false,
|
||||
"output_scores": false,
|
||||
"output_logits": null,
|
||||
"return_dict_in_generate": false,
|
||||
"pad_token_id": 151643,
|
||||
"bos_token_id": 151643,
|
||||
"eos_token_id": [
|
||||
151645,
|
||||
151643
|
||||
],
|
||||
"encoder_no_repeat_ngram_size": 0,
|
||||
"decoder_start_token_id": null,
|
||||
"is_assistant": false,
|
||||
"num_assistant_tokens": 20,
|
||||
"num_assistant_tokens_schedule": "constant",
|
||||
"assistant_confidence_threshold": 0.4,
|
||||
"prompt_lookup_num_tokens": null,
|
||||
"max_matching_ngram_size": null,
|
||||
"assistant_early_exit": null,
|
||||
"assistant_lookbehind": 10,
|
||||
"target_lookbehind": 10,
|
||||
"disable_compile": false,
|
||||
"generation_kwargs": {},
|
||||
"_from_model_config": false,
|
||||
"transformers_version": "4.51.3"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00013.safetensors
Normal file
3
model-00001-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:da4e028119ef8bc718cedc7ec554e548ac00d1b1db5c45a660530bf39b5af98e
|
||||
size 4684558368
|
||||
3
model-00002-of-00013.safetensors
Normal file
3
model-00002-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:bc7aa23c6a75cef984b3c8a71f57772e90836d1693bc364beab4f794e333d680
|
||||
size 4676778920
|
||||
3
model-00003-of-00013.safetensors
Normal file
3
model-00003-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ad3c47b50dd243618c6ca9fca3ef5d4e096c026c4956a506ee0a22ccc438a2dc
|
||||
size 4928480056
|
||||
3
model-00004-of-00013.safetensors
Normal file
3
model-00004-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7411117e02260a269005757dfa8b6d0f80d8e33ed7f393a27bd568d03187af73
|
||||
size 4928480080
|
||||
3
model-00005-of-00013.safetensors
Normal file
3
model-00005-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8a2d3b7f54b24ccf24ede0fbad390c796be26e6cc3e336513d8efd52b32b304c
|
||||
size 4676778952
|
||||
3
model-00006-of-00013.safetensors
Normal file
3
model-00006-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c498722354638879b074c746816747ea280564b4a28978c158d0248b745b650a
|
||||
size 4928480096
|
||||
3
model-00007-of-00013.safetensors
Normal file
3
model-00007-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6d40d78446eedebe75e9c9da4081ee0e3e86ac05e67bac32c2b345293e85ba2d
|
||||
size 4928480096
|
||||
3
model-00008-of-00013.safetensors
Normal file
3
model-00008-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:df03d235247c6eaf96364d096ae49869af0b5cab9c5b00ce7843564dafdcd4b0
|
||||
size 4676778952
|
||||
3
model-00009-of-00013.safetensors
Normal file
3
model-00009-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4becc4f0b1cd604a798f2d02d917d5f16fafa4c204ac13cf464d092e42047c84
|
||||
size 4928480096
|
||||
3
model-00010-of-00013.safetensors
Normal file
3
model-00010-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:73830f584e55f1820c9e9cee3190c8bc92fd41e14dea4eb717f7947124d13c8e
|
||||
size 4928480096
|
||||
3
model-00011-of-00013.safetensors
Normal file
3
model-00011-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c2f032a543c44f64d73e98d08396dfa651e616b980cda719fe1958e071ad08b0
|
||||
size 4676778952
|
||||
3
model-00012-of-00013.safetensors
Normal file
3
model-00012-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3f8f74af26752a43bf2508f87bb495d73356fc052d43193afccdf9e581ed3ea7
|
||||
size 2999075752
|
||||
3
model-00013-of-00013.safetensors
Normal file
3
model-00013-of-00013.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:facba960024e8c4baad69193baddfaa7616e9e0cd9d6b539a4e84c8c6d182343
|
||||
size 3111649408
|
||||
3
model.bin
Normal file
3
model.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:31be40d2534f26f6bf225a10a4f1824e553f05eb062f985fd53a7d59da531ec0
|
||||
size 15582153002
|
||||
450
model.safetensors.index.json
Normal file
450
model.safetensors.index.json
Normal file
@@ -0,0 +1,450 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 59073228800
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00013-of-00013.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00013.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.k_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.q_norm.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00005-of-00013.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.k_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.q_norm.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00006-of-00013.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.k_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.q_norm.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00007-of-00013.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.k_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.q_norm.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00008-of-00013.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.k_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.q_norm.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00009-of-00013.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.k_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.q_norm.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00010-of-00013.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.input_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.mlp.down_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.post_attention_layernorm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.36.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.mlp.gate_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.mlp.up_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.k_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.k_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.o_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.q_norm.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.q_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.37.self_attn.v_proj.weight": "model-00011-of-00013.safetensors",
|
||||
"model.layers.38.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.38.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.input_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.mlp.down_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.mlp.gate_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.mlp.up_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.post_attention_layernorm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.k_norm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.k_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.o_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.q_norm.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.q_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.39.self_attn.v_proj.weight": "model-00012-of-00013.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.k_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.q_norm.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00013.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.k_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.q_norm.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00013.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.k_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.q_norm.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00004-of-00013.safetensors",
|
||||
"model.norm.weight": "model-00012-of-00013.safetensors"
|
||||
}
|
||||
}
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
||||
size 11422654
|
||||
240
tokenizer_config.json
Normal file
240
tokenizer_config.json
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151665": {
|
||||
"content": "<tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151666": {
|
||||
"content": "</tool_response>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151667": {
|
||||
"content": "<think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151668": {
|
||||
"content": "</think>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 131072,
|
||||
"pad_token": "<|endoftext|>",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user