初始化项目，由ModelHub XC社区提供模型

Model: Igriscodes/qwen3-4b-tool-gguf Source: Original Platform
2026-06-19 01:52:16 +08:00
commit c0b0befaf3
11 changed files with 274 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,44 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-f16.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q2_k.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q6_k.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-4b-tool-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,203 @@
+---
+license: mpl-2.0
+base_model: Igriscodes/qwen3-4b-tool
+tags:
+- tool-use
+- function-calling
+- reinforcement-learning
+- mcp
+- gguf
+- quantized
+pipeline_tag: text-generation
+language:
+- en
+---
+
+# Qwen3-4B-Agentic-MCP-RL - GGUF
+
+This repository contains the GGUF quantization files for [Igriscodes/qwen-tool](https://huggingface.co/Igriscodes/qwen-tool), a fine-tuned `Qwen/Qwen3-1.7B` model optimized for multi-step tool use and structured payload delivery via the **Model Context Protocol (MCP)**. 
+
+The base model was aligned using **Proximal Policy Optimization (PPO)** on strict JSON validation, execution tracking, and tool-error recovery loops. These GGUF files allow for low-latency, low-memory local inference on edge devices, CPU-only systems, and Apple Silicon.
+
+## Available Quantizations
+
+* **Q2_K**: Maximum compression. Significant loss in logic, not recommended for complex tool-use but fits on ultra-low-memory devices.
+* **Q3_K_M**: Balanced 3-bit compression. Better logic than Q2, suitable for highly constrained memory footprints.
+* **Q4_0**: Standard legacy 4-bit quantization. Faster on certain older hardware architectures but slightly lower quality than K-quants.
+* **Q4_K_M**: **Recommended.** Optimal balance of reasoning performance, generation speed, and VRAM savings.
+* **Q5_0**: Standard legacy 5-bit quantization. Good middle ground, but outpaced by K-quants.
+* **Q5_K_M**: High quality 5-bit compression. Retains nearly all unquantized capabilities while saving substantial VRAM.
+* **Q6_K**: 6-bit quantization. Near-zero degradation from F16 while shaving off a decent chunk of file size.
+* **Q8_0**: Maximum 8-bit fidelity. Extremely close to native F16 performance, ideal for strict syntax and reliable tool-calling.
+* **F16**: Unquantized baseline. High fidelity, near-native performance for systems with more memory overhead.
+
+## Local Deployment Quickstart
+
+### Using Ollama
+Ollama supports running models directly from Hugging Face via the `hf.co` registry prefix. You can pull and run your preferred precision instantly:
+
+```bash
+# Q2_K (Extreme compression)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q2_K
+
+# Q3_K_M (Medium 3-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q3_K_M
+
+# Q4_0 (Legacy 4-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q4_0
+
+# Q4_K_M (Recommended balanced version)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q4_K_M
+
+# Q5_0 (Legacy 5-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q5_0
+
+# Q5_K_M (High-fidelity 5-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q5_K_M
+
+# Q6_K (Deep 6-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q6_K
+
+# Q8_0 (Near-lossless 8-bit)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:Q8_0
+
+# F16 (High-fidelity unquantized float version)
+ollama run hf.co/Igriscodes/qwen3-4b-tool-gguf:F16
+
+```
+
+**or**
+
+## Ollama Setup Guide
+
+To run this model locally with full tool-calling (function calling) and thinking capabilities, you can easily package it into an **Ollama** model using the provided template configuration.
+
+### 1. Create the Modelfile
+
+Save the configuration block below exactly as a file named `Modelfile` in the same directory where your downloaded GGUF file is located.
+
+> 💡 **Note:** If you are using a different quantization format than the `q4_k_m` example below, make sure to update the `FROM` line to match your exact `.gguf` filename.
+
+```text
+# Point to your quantized GGUF file
+FROM ./qwen3-4b-tool-q4_k_m.gguf
+
+# Custom template optimizing tool-use syntax and thought blocks
+TEMPLATE """{{- $lastUserIdx := -1 -}}
+{{- range $idx, $msg := .Messages -}}
+{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
+{{- end }}
+{{- if or .System .Tools }}<|im_start|>system
+{{ if .System }}{{ .System }}
+
+{{ end }}
+{{- if .Tools }}# Tools
+
+You may call one or more functions to assist with the user query.
+
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{{- range .Tools }}
+{"type": "function", "function": {{ .Function }}}
+{{- end }}
+</tools>
+
+For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
+<tool_call>
+{"name": <function-name>, "arguments": <args-json-object>}
+</tool_call>
+{{- end -}}
+<|im_end|>
+{{ end }}
+{{- range $i, $_ := .Messages }}
+{{- $last := eq (len (slice $.Messages $i)) 1 -}}
+{{- if eq .Role "user" }}<|im_start|>user
+{{ .Content }}<|im_end|>
+{{ else if eq .Role "assistant" }}<|im_start|>assistant
+{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
+<think>{{ .Thinking }}</think>
+{{ end -}}
+{{ if .Content }}{{ .Content }}{{ end }}
+{{- if .ToolCalls }}
+{{- range .ToolCalls }}
+<tool_call>
+{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
+</tool_call>
+{{- end }}
+{{- end }}{{ if not $last }}<|im_end|>
+{{ end }}
+{{- else if eq .Role "tool" }}<|im_start|>user
+<tool_response>
+{{ .Content }}
+</tool_response><|im_end|>
+{{ end }}
+{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
+<think>
+{{ end }}
+{{- end }}"""
+
+# Inference parameters optimized for structured reasoning
+PARAMETER temperature 0.6
+PARAMETER num_ctx 8192
+PARAMETER num_gpu -1
+PARAMETER top_k 20
+PARAMETER top_p 0.95
+PARAMETER repeat_penalty 1
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+
+```
+
+### 2. Build and Run the Model
+
+Open your terminal, navigate to the directory containing your `Modelfile` and your `.gguf` file, and execute the build command:
+
+```bash
+ollama create qwen3-4b-tool --file Modelfile
+
+```
+
+Once the build process completes, you can launch and interact with your new custom model natively via Ollama:
+
+```bash
+ollama run qwen3-4b-tool
+
+```
+
+### Using Python (`llama-cpp-python`)
+
+First, ensure you have the library installed:
+
+```bash
+pip install llama-cpp-python
+```
+
+Depending on your hardware constraints, you can load either the uncompressed precision or the quantized version using the snippets below:
+
+#### Option 1: High Fidelity (F16 Precision)
+
+```python
+from llama_cpp import Llama
+
+llm = Llama.from_pretrained(
+    repo_id="Igriscodes/qwen3-4b-tool-gguf",
+    filename="qwen3-4b-tool-f16.gguf",
+    n_ctx=2048,
+    n_gpu_layers=-1 # Use -1 to offload all layers to GPU (Metal/CUDA)
+)
+
+```
+
+#### Option 2: Low Resource (Q4 Quantization)
+
+```python
+from llama_cpp import Llama
+
+llm = Llama.from_pretrained(
+    repo_id="Igriscodes/qwen3-4b-tool-gguf",
+    filename="qwen3-4b-tool-q4.gguf",
+    n_ctx=2048,
+    n_gpu_layers=-1 # Optimized for CPU execution or limited VRAM
+)
+
+```
--- a/qwen3-4b-tool-f16.gguf
+++ b/qwen3-4b-tool-f16.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:62ddb81c318da8f5c61cac4fa8dfd50a1e21c8c057fd12761e9a6be792260844
+size 8051284768
--- a/qwen3-4b-tool-q2_k.gguf
+++ b/qwen3-4b-tool-q2_k.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:06787895f30308af73ecf5d25879501ece8e0f3e9bbcda287126dac1a781b95d
+size 1669499168
--- a/qwen3-4b-tool-q3_k_m.gguf
+++ b/qwen3-4b-tool-q3_k_m.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e116e6b3a3e1e7085681866fde33c03917e1ce4b7a4bae1a813d86000ee13cda
+size 2075617568
--- a/qwen3-4b-tool-q4_0.gguf
+++ b/qwen3-4b-tool-q4_0.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d7a95312996e0f7857dd7b82d90c94f2ce820fd9e5307773cb83899bee3b0bdc
+size 2369546528
--- a/qwen3-4b-tool-q4_k_m.gguf
+++ b/qwen3-4b-tool-q4_k_m.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5c25a63b804d1d285037bcf05998363fa014b245c3f99cfde2114a09d823b202
+size 2497280288
--- a/qwen3-4b-tool-q5_0.gguf
+++ b/qwen3-4b-tool-q5_0.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:7da789cdbf2346be7a3d2bd37dcbe8f8c689172bd68636acecb36483bd73bda8
+size 2823711008
--- a/qwen3-4b-tool-q5_k_m.gguf
+++ b/qwen3-4b-tool-q5_k_m.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3e98a124f1900144e440cc0a025c5ce083094eaaee10044aaaeac708df599474
+size 2889513248
--- a/qwen3-4b-tool-q6_k.gguf
+++ b/qwen3-4b-tool-q6_k.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e11db09a622c2048408bd69bd828f67df8a3605250d4a119ffbdfe3f0846877a
+size 3306260768
--- a/qwen3-4b-tool-q8_0.gguf
+++ b/qwen3-4b-tool-q8_0.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:dccfd20677d342fc7fcb3973fb99ef8cc8dfd9d50cb1957ccce1b5508c2f54f7
+size 4280404768