LocoOperator-4B-i1-GGUF/README.md

---
library_name: transformers
license: mit
base_model:
- LocoreMind/LocoOperator-4B
tags:
- code
- agent
- tool-calling
- distillation
- qwen3
- gguf
- llama-cpp
language:
- en
pipeline_tag: text-generation
---
# This is a Imatrix quantization of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), made by [SimplySara](https://huggingface.co/SimplySara)

| Model                             |   Size_GB |   BPW |     PPL_Q |   KLD_Mean |   KLD_Max | Top_P_Match   |
|:----------------------------------|----------:|------:|----------:|-----------:|----------:|:--------------|
| LocoOperator-4B-BF16.gguf         |     7.498 | 16.01 |   9.24309 |  -1.2e-05  |   4e-06   | 100.000%      |
| LocoOperator-4B-MXFP4_MOE.gguf    |     3.986 |  8.51 |   9.24606 |   0.001835 |   2.98238 | 97.518%       |
| LocoOperator-4B-i1-MXFP4_MOE.gguf |     3.986 |  8.51 |   9.24606 |   0.001835 |   2.98238 | 97.518%       |
| LocoOperator-4B-Q8_0.gguf         |     3.986 |  8.51 |   9.24606 |   0.001835 |   2.98238 | 97.518%       |
| LocoOperator-4B-i1-Q8_0.gguf      |     3.986 |  8.51 |   9.24606 |   0.001835 |   2.98238 | 97.518%       |
| LocoOperator-4B-Q6_K.gguf         |     3.079 |  6.58 |   9.27926 |   0.0068   |  10.5686  | 95.526%       |
| LocoOperator-4B-i1-Q6_K.gguf      |     3.079 |  6.58 |   9.295   |   0.006075 |  15.9945  | 95.857%       |
| LocoOperator-4B-i1-Q5_1.gguf      |     2.841 |  6.07 |   9.28859 |   0.01364  |   2.98838 | 94.135%       |
| LocoOperator-4B-Q5_1.gguf         |     2.841 |  6.07 |   9.43222 |   0.022675 |  16.3454  | 93.161%       |
| LocoOperator-4B-Q5_K_M.gguf       |     2.691 |  5.75 |   9.35457 |   0.017023 |  12.3947  | 93.635%       |
| LocoOperator-4B-i1-Q5_K_M.gguf    |     2.691 |  5.75 |   9.2965  |   0.013153 |   7.78613 | 94.257%       |
| LocoOperator-4B-i1-Q5_0.gguf      |     2.636 |  5.63 |   9.42255 |   0.019663 |  17.94    | 93.208%       |
| LocoOperator-4B-Q5_0.gguf         |     2.63  |  5.62 |   9.41521 |   0.023403 |  31.4019  | 92.839%       |
| LocoOperator-4B-Q5_K_S.gguf       |     2.63  |  5.62 |   9.44087 |   0.022119 |  13.6483  | 92.800%       |
| LocoOperator-4B-i1-Q5_K_S.gguf    |     2.63  |  5.62 |   9.28767 |   0.014865 |   7.65169 | 93.702%       |
| LocoOperator-4B-Q4_1.gguf         |     2.418 |  5.16 |   9.66722 |   0.074718 |  15.0861  | 87.757%       |
| LocoOperator-4B-i1-Q4_1.gguf      |     2.418 |  5.16 |   9.45293 |   0.038707 |  13.8444  | 90.574%       |
| LocoOperator-4B-Q4_K_M.gguf       |     2.326 |  4.97 |   9.48239 |   0.048236 |  15.3105  | 90.300%       |
| LocoOperator-4B-i1-Q4_K_M.gguf    |     2.326 |  4.97 |   9.48582 |   0.03368  |  13.551   | 91.233%       |
| LocoOperator-4B-IQ4_NL.gguf       |     2.229 |  4.76 |   9.60891 |   0.050173 |  11.4324  | 89.708%       |
| LocoOperator-4B-i1-Q4_K_S.gguf    |     2.22  |  4.74 |   9.47603 |   0.039843 |  10.0551  | 90.557%       |
| LocoOperator-4B-Q4_K_S.gguf       |     2.22  |  4.74 |   9.80236 |   0.068821 |  15.209   | 88.513%       |
| LocoOperator-4B-i1-IQ4_NL.gguf    |     2.218 |  4.74 |   9.50223 |   0.039414 |   8.18964 | 90.573%       |
| LocoOperator-4B-i1-Q4_0.gguf      |     2.213 |  4.73 |   9.79026 |   0.063915 |  12.6928  | 88.737%       |
| LocoOperator-4B-Q4_0.gguf         |     2.207 |  4.71 |   9.86629 |   0.074527 |  13.2501  | 87.758%       |
| LocoOperator-4B-IQ4_XS.gguf       |     2.129 |  4.55 |   9.62193 |   0.051911 |  11.0682  | 89.705%       |
| LocoOperator-4B-i1-IQ4_XS.gguf    |     2.115 |  4.52 |   9.49687 |   0.040098 |   7.03875 | 90.402%       |
| LocoOperator-4B-Q3_K_L.gguf       |     2.086 |  4.45 |  10.2476  |   0.121944 |  27.0257  | 84.146%       |
| LocoOperator-4B-i1-Q3_K_L.gguf    |     2.086 |  4.45 |   9.90811 |   0.090874 |  15.8122  | 86.154%       |
| LocoOperator-4B-Q3_K_M.gguf       |     1.933 |  4.13 |  10.7021  |   0.15788  |  20.2044  | 82.662%       |
| LocoOperator-4B-i1-Q3_K_M.gguf    |     1.933 |  4.13 |   9.98057 |   0.102708 |  16.8243  | 85.354%       |
| LocoOperator-4B-i1-IQ3_M.gguf     |     1.828 |  3.9  |  10.1634  |   0.137347 |  14.6883  | 83.180%       |
| LocoOperator-4B-IQ3_M.gguf        |     1.828 |  3.9  |  14.2539  |   0.557713 |  19.4397  | 67.631%       |
| LocoOperator-4B-IQ3_S.gguf        |     1.769 |  3.78 |  15.0624  |   0.619131 |  20.122   | 65.931%       |
| LocoOperator-4B-i1-IQ3_S.gguf     |     1.769 |  3.78 |  10.1755  |   0.142066 |  17.0028  | 83.139%       |
| LocoOperator-4B-i1-Q3_K_S.gguf    |     1.757 |  3.75 |  10.8886  |   0.171224 |  28.3373  | 82.133%       |
| LocoOperator-4B-Q3_K_S.gguf       |     1.757 |  3.75 |  11.5475  |   0.237895 |  30.6868  | 79.412%       |
| LocoOperator-4B-i1-IQ3_XS.gguf    |     1.69  |  3.61 |  10.3629  |   0.168783 |  14.3358  | 81.928%       |
| LocoOperator-4B-i1-Q2_K.gguf      |     1.555 |  3.32 |  12.1574  |   0.328652 |  18.6622  | 75.570%       |
| LocoOperator-4B-i1-IQ3_XXS.gguf   |     1.555 |  3.32 |  11.2795  |   0.263448 |  25.251   | 77.569%       |
| LocoOperator-4B-Q2_K.gguf         |     1.555 |  3.32 |  17.153   |   0.713596 |  16.3946  | 64.880%       |
| LocoOperator-4B-i1-Q2_K_S.gguf    |     1.456 |  3.11 |  13.1709  |   0.450125 |  18.3826  | 71.231%       |
| LocoOperator-4B-i1-IQ2_M.gguf     |     1.409 |  3.01 |  14.0857  |   0.544764 |  18.5618  | 67.933%       |
| LocoOperator-4B-i1-IQ2_S.gguf     |     1.32  |  2.82 |  15.0717  |   0.621189 |  24.0981  | 65.722%       |
| LocoOperator-4B-i1-IQ2_XS.gguf    |     1.261 |  2.69 |  16.8277  |   0.750336 |  19.2128  | 63.162%       |
| LocoOperator-4B-i1-IQ2_XXS.gguf   |     1.161 |  2.48 |  27.5988  |   1.32144  |  14.6807  | 52.522%       |
| LocoOperator-4B-i1-IQ1_M.gguf     |     1.05  |  2.24 |  49.0978  |   1.9323   |  16.5947  | 44.067%       |
| LocoOperator-4B-i1-IQ1_S.gguf     |     0.983 |  2.1  | 139.951   |   3.03274  |  16.0947  | 28.387%       |


------

<div align="center">
  <img src="assets/loco_operator.png" width="55%" alt="LocoOperator" />
</div>

<br>

<div align="center">

[![MODEL](https://img.shields.io/badge/Model-FFB300?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoOperator-4B)
[![GGUF](https://img.shields.io/badge/GGUF-FF6F00?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/LocoreMind/LocoOperator-4B-GGUF)
[![Blog](https://img.shields.io/badge/Blog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https://locoremind.com/blog/loco-operator)
[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/LocoreMind/LocoOperator)
[![Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&logoColor=white)](https://colab.research.google.com/github/LocoreMind/LocoOperator/blob/main/LocoOperator_4B.ipynb)

</div>

## Introduction

**LocoOperator-4B** is a 4B-parameter tool-calling agent model trained via knowledge distillation from **Qwen3-Coder-Next** inference traces. It specializes in multi-turn codebase exploration — reading files, searching code, and navigating project structures within a Claude Code-style agent loop. Designed as a local sub agent, it runs via llama.cpp at zero API cost.

|  | LocoOperator-4B |
|:--|:--|
| **Base Model** | [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
| **Teacher Model** | Qwen3-Coder-Next |
| **Training Method** | Full-parameter SFT (distillation) |
| **Training Data** | 170,356 multi-turn conversation samples |
| **Max Sequence Length** | 16,384 tokens |
| **Training Hardware** | 4x NVIDIA H200 141GB SXM5 |
| **Training Time** | ~25 hours |
| **Framework** | MS-SWIFT |

## Key Features

- **Tool-Calling Agent**: Generates structured `<tool_call>` JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation)
- **100% JSON Validity**: Every tool call is valid JSON with all required arguments — outperforming the teacher model (87.6%)
- **Local Deployment**: GGUF quantized, runs on Mac Studio via llama.cpp at zero API cost
- **Lightweight Explorer**: 4B parameters, optimized for fast codebase search and navigation
- **Multi-Turn**: Handles conversation depths of 3–33 messages with consistent tool-calling behavior

## Performance

Evaluated on 65 multi-turn conversation samples from diverse open-source projects (scipy, fastapi, arrow, attrs, gevent, gunicorn, etc.), with labels generated by Qwen3-Coder-Next.

### Core Metrics

| Metric | Score |
|:-------|:-----:|
| **Tool Call Presence Alignment** | **100%** (65/65) |
| **First Tool Type Match** | **65.6%** (40/61) |
| **JSON Validity** | **100%** (76/76) |
| **Argument Syntax Correctness** | **100%** (76/76) |

The model perfectly learned *when* to use tools vs. when to respond with text (100% presence alignment). Tool type mismatches are between semantically similar tools (e.g. Grep vs Read) — different but often valid strategies.

### Tool Distribution Comparison

<div align="center">
  <img src="assets/tool_distribution.png" width="80%" alt="Tool Distribution Comparison" />
</div>

### JSON & Argument Syntax Correctness

| Model | JSON Valid | Argument Syntax Valid |
|:------|:---------:|:--------------------:|
| **LocoOperator-4B** | 76/76 (100%) | 76/76 (100%) |
| Qwen3-Coder-Next (teacher) | 89/89 (100%) | 78/89 (87.6%) |

> LocoOperator-4B achieves perfect structured output. The teacher model has 11 tool calls with missing required arguments (empty `arguments: {}`).

## Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LocoreMind/LocoOperator-4B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the messages
messages = [
    {
        "role": "system",
        "content": "You are a read-only codebase search specialist.\n\nCRITICAL CONSTRAINTS:\n1. STRICTLY READ-ONLY: You cannot create, edit, delete, move files, or run any state-changing commands. Use tools/bash ONLY for reading (e.g., ls, find, cat, grep).\n2. EFFICIENCY: Spawn multiple parallel tool calls for faster searching.\n3. OUTPUT RULES: \n   - ALWAYS use absolute file paths.\n   - STRICTLY NO EMOJIS in your response.\n   - Output your final report directly. Do not use colons before tool calls.\n\nENV: Working directory is /Users/developer/workspace/code-analyzer (macOS, zsh)."
    },
    {
        "role": "user",
        "content": "Analyze the Black codebase at `/Users/developer/workspace/code-analyzer/projects/black`.\nFind and explain:\n1. How Black discovers config files.\n2. The exact search order for config files.\n3. Supported config file formats.\n4. Where this configuration discovery logic lives in the codebase.\n\nReturn a comprehensive answer with relevant code snippets and absolute file paths."
    }
]

# prepare the model input
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)
```

## Local Deployment

For GGUF quantized deployment with llama.cpp, hybrid proxy routing, and batch analysis pipelines, refer to our [GitHub repository](https://github.com/LocoreMind/LocoOperator).

## Training Details

| Parameter | Value |
|:----------|:------|
| Base model | Qwen3-4B-Instruct-2507 |
| Teacher model | Qwen3-Coder-Next |
| Method | Full-parameter SFT |
| Training data | 170,356 samples |
| Hardware | 4x NVIDIA H200 141GB SXM5 |
| Parallelism | DDP (no DeepSpeed) |
| Precision | BF16 |
| Epochs | 1 |
| Batch size | 2/GPU, gradient accumulation 4 (effective batch 32) |
| Learning rate | 2e-5, warmup ratio 0.03 |
| Max sequence length | 16,384 tokens |
| Template | qwen3_nothinking |
| Framework | MS-SWIFT |
| Training time | ~25 hours |
| Checkpoint | Step 2524 |

## Known Limitations

- First-tool-type match is 65.6% — the model sometimes picks a different (but not necessarily wrong) tool than the teacher
- Tends to under-generate parallel tool calls compared to the teacher (76 vs 89 total calls across 65 samples)
- Preference for Bash over Read may indicate the model defaults to shell commands where file reads would be more appropriate
- Evaluated on 65 samples only; larger-scale evaluation needed

## License

MIT

## Acknowledgments

- [Qwen Team](https://huggingface.co/Qwen) for the Qwen3-4B-Instruct-2507 base model
- [MS-SWIFT](https://github.com/modelscope/ms-swift) for the training framework
- [llama.cpp](https://github.com/ggerganov/llama.cpp) for efficient local inference
- [Anthropic](https://www.anthropic.com/) for the Claude Code agent loop design that inspired this work