初始化项目，由ModelHub XC社区提供模型

Model: clglavan/magos-k8s-0.6b Source: Original Platform
2026-05-31 16:57:17 +08:00
commit 02da0760c5
11 changed files with 477 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 magos-k8s-0.6b-f16.gguf filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 magos-k8s-0.6b-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
 magos-k8s-0.6b-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,229 @@
 ---
 license: apache-2.0
 language:
  - en
 base_model: Qwen/Qwen3-0.6B
 library_name: transformers
 pipeline_tag: text-generation
 tags:
  - kubernetes
  - devops
  - sft
  - cpt
  - qwen3
  - gguf
 ---
 # magos-k8s-0.6b
 A small (0.6B parameter) **Kubernetes debugging assistant**, fine-tuned from
 **Qwen3-0.6B** on Kubernetes documentation, the full Kubernetes API reference
 (every resource Kind), the kubectl command reference, and Prometheus alert
 runbooks.
 ## Purpose
 magos exists to **nudge any input — vague, off-topic, or underspecified — into
 a debugging mindset**: identify the missing information, propose the
 `kubectl`/`promtool`/log-inspection step that would resolve the ambiguity,
 and respond with a concrete next action instead of speculation. This bias
 toward "what would I check next?" is trained in by design, not an emergent
 property.
 That bias is what makes magos **ideal for autonomous devops agents** running
 in a planner → executor loop. The model reliably emits the *next executable
 move* (a kubectl invocation, a YAML patch, a metric to scrape) rather than
 prose, which is what an agent needs to make progress on its own. It runs
 locally at ~600 MB (Q8) and is fast enough to be the inner-loop reasoner of
 an agent that also carries longer-horizon planning elsewhere.
 ### Pair with RAG for best results
 magos is intentionally small and ships frozen knowledge from the
 documentation snapshot it was trained on. **Pairing it with a retrieval
 layer** (your live cluster's `kubectl explain` output, your team's runbooks,
 your fleet's current CRD schemas, recent incident postmortems) lifts answer
 quality substantially while keeping the model itself tiny — you get the
 debugging-mindset reflex from magos plus authoritative, current facts from
 your retriever, instead of having to grow the model to memorise everything.
 A typical setup: retrieve 2–5 short snippets relevant to the user's symptom
 and prepend them to the prompt; magos will weave them into its
 next-step-first response.
 ## What's new in v8 (vs v7)
 | | v7 | **v8** |
 |---|---|---|
 | Stage 2 training examples | ~6,100 | **~6,740** (+10%) |
 | YAML bucket | 780 unfiltered | **521 schema-filtered** — every example's apiVersion+fields validated against the K8s v1.34 OpenAPI spec; ~33% invented-field examples dropped before training |
 | Anti-hallucination contrast bucket | none | **~317 new examples** teaching wrong-vs-right pairs for kubectl flags, YAML field names, and diagnosis patterns mined from v7's actual failures |
 | General-instruct mix | none | **~600 Alpaca examples** (~9%) blended in to defend against catastrophic forgetting of base reasoning |
 | Stage 2 LR / epochs | 1.5e-5 / 2 epochs | **1.5e-5 / 2 epochs** (unchanged — proven recipe) |
 | Stage 2 eval_loss | 1.667 | 1.716 (slightly higher — expected, since 9% of examples are out-of-K8s-distribution Alpaca) |
 ### Why these changes
 v7's main weaknesses surfaced in agent-usability review:
 1. **Specific flag/field hallucinations**: `--show-namespace`, `--limit` on
   `kubectl logs`, `volumeAccessModes`, `autoscaling/v2beta3`. We mined the
   actual hallucinations v7 produced across 75 benchmark verdicts (817
   occurrences) and built targeted **contrast pairs** — for each known wrong
   pattern, a paired Q&A that explicitly contrasts it with the correct one.
 2. **YAML schema invention**: v7's YAML bucket was not validated post-synth.
   v8 runs each example through the v1.34 OpenAPI lookup and drops any
   example with >2 invented field paths.
 3. **General-reasoning regression**: v7 lost 3 points on the general bucket
   vs v6. v8 mixes in a small Alpaca slice so non-K8s prompts stay sharp.
 ### Benchmarks (3-judge consensus, anonymized review of v6 vs v7 vs v8 across 25 prompts)
 Each of 25 prompts was evaluated by 3 independent reviewers who saw the
 responses **anonymized as A/B/C** with the rubric for that prompt. Reviewers
 were forced to produce explicit reasoning, list verified facts and
 hallucinations, and rate `agent_usable` before assigning a 1-5 score. Final
 per-prompt score is the **median of the 3 reviewers' scores**.
 | Bucket | Max | v6 | v7 | **v8** |
 |---|---|---|---|---|
 | kubectl/CLI accuracy   | 30 | 8  | 10 | **14** (+4) |
 | YAML manifest validity | 25 | 6  | 11 | **12** (+1) |
 | Debugging diagnose     | 30 | 9  | **10** | 8 (-2) |
 | Prometheus runbook     | 25 | **7** | **7** | 6 (-1) |
 | General reasoning      | 15 | 14 | 12 | **15** (+3) |
 | **Total**              | **125** | **44 (35%)** | **50 (40%)** | **55 (44%)** |
 **Headline:** v8 takes the largest single-version jump yet in kubectl
 accuracy (+4 points on a 30-point bucket) and recovers full general-reasoning
 performance, at a small cost in Diagnose and Runbook accuracy (-2 and -1).
 The Alpaca mix successfully defended against forgetting; the contrast bucket
 visibly suppressed the specific hallucinated flags v7 was repeating.
 **Honest absolute level:** even v8 scores 44% on this benchmark. The judges
 grade strictly for *agent-usability* — a single invented flag or wrong
 apiVersion is enough to mark a response as not-executable. v8 is the best
 version of magos yet, but there is substantial room to grow toward 100%.
 To pin a specific version when loading:
 ```python
 AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b", revision="v8")
 # or revision="v7" / "v6" / "v5" / "v3" / "v2" for previous versions
 ```
 ## What it's good at
 - **kubectl command construction** — v8's strongest area. Real flags,
  correct flag forms, no `--show-namespace`/`--limit-on-logs` style
  inventions seen in v7.
 - **YAML manifest generation** — Pod, Deployment, Service, NetworkPolicy,
  PVC, HPA, ConfigMap, Secret, RBAC and ~70 other top-level Kinds all have
  correct apiVersion and field names (schema-validated training set).
 - **Diagnosing pasted errors** — `kubectl describe` output, log lines, alert
  payloads → root cause + next-step suggestions
 - **Prometheus alert handling** — meaning + diagnostic steps for the
  prometheus-operator runbook set (KubePodCrashLooping, etcdBackendQuotaLowSpace,
  AlertmanagerClusterDown, etc.)
 - **Agent-style outputs** — short, command-first responses suitable for
  autonomous execution rather than human reading
 - **Basic general reasoning** — Alpaca mix preserves math, generic CS facts,
  short explanations
 ## What it's not good at
 - Multi-step planning or complex tool chains — it's a 0.6B model
 - **Subtle/rare flags** — common flags are reliable; rare-but-real flags are
  still sometimes hallucinated. Always sanity-check with `kubectl --help`.
 - **Multi-flag combinations on the same command** — accuracy drops as flag
  count goes up
 - Knowledge of features released after the source docs were captured (mid-2026)
 - Long-form thinking — SFT suppressed Qwen3's `<think>` behavior
 ## How to use
 ### llama.cpp / Ollama / LM Studio
 Three GGUF quantization levels are included — pick one:
 | File | Size | Quality |
 |---|---|---|
 | `magos-k8s-0.6b-f16.gguf` | 1.2 GB | reference (full bf16 precision) |
 | `magos-k8s-0.6b-q8_0.gguf` | 610 MB | effectively identical to f16, half the size — **recommended** |
 | `magos-k8s-0.6b-q4_k_m.gguf` | 379 MB | smallest. Some quality loss — `kubectl` flag/argument mistakes appear more often than with q8/f16. Fine for casual use, not recommended for accuracy-critical tasks. |
 Example with `llama-cpp-python`:
 ```python
 from llama_cpp import Llama
 llm = Llama(model_path="magos-k8s-0.6b-q8_0.gguf", n_ctx=4096, chat_format="chatml")
 resp = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Drain node worker-3 ignoring DaemonSets and deleting local-storage pods."}],
    temperature=0.05,
    repeat_penalty=1.15,
    max_tokens=512,
 )
 print(resp["choices"][0]["message"]["content"])
 ```
 The `temperature=0.05` and `repeat_penalty=1.15` defaults are important —
 0.6B models loop on longer structured outputs without a repetition penalty.
 ### Hugging Face transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 tok   = AutoTokenizer.from_pretrained("clglavan/magos-k8s-0.6b")
 model = AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b",
                                             dtype="bfloat16",
                                             device_map="auto")
 messages = [{"role": "user", "content": "Give me a NetworkPolicy that denies all egress from app pods except DNS."}]
 prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tok(prompt, return_tensors="pt").to(model.device)
 out = model.generate(**inputs, max_new_tokens=384,
                     do_sample=True, temperature=0.05,
                     top_p=0.95, top_k=20, repetition_penalty=1.15)
 print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
 ```
 ## Training
 | | |
 |---|---|
 | Base model | Qwen/Qwen3-0.6B |
 | Method | Two stage: **continued pre-training (CPT) → supervised fine-tuning (SFT)**. Both full-weight (no LoRA). |
 | **Stage 1 corpus** | ~8.5k document chunks: kubernetes.io docs + blog (~6.5k), Kubernetes API reference v1.34 (~1.9k), Prometheus alert runbooks (~106). **Reused from v5/v6/v7 — corpus unchanged.** |
 | Stage 1 tokens | ~6.5M |
 | Stage 1 LR | 5e-6, cosine, 3% warmup, 1 epoch |
 | **Stage 2 corpus (v8)** | ~6,740 synthetic Q&A pairs. Distribution: K8s debugging (~1.7k), K8s API field/schema (~1.3k), Prometheus runbook (~1.0k, 10 examples per runbook), kubectl reference (~1.3k, 15 per subcommand), **schema-filtered YAML bucket (~520)**, **anti-hallucination contrast bucket (~317)**, **general-instruct mix (~600)** |
 | Stage 2 LR | 1.5e-5, cosine, 3% warmup, 2 epochs |
 | Micro batch / grad accum | 1 / 16 (effective batch 16) |
 | Precision | bfloat16 |
 | Sequence length | 2048 |
 | Stage 1 eval_loss | 1.71 |
 | Stage 2 eval_loss | 1.72 (v7 was 1.67; the small regression reflects the 9% Alpaca slice being out-of-K8s-distribution — judge benchmark is the real measure) |
 ## Files
 - `model.safetensors` — fine-tuned weights, HF format (1.2 GB, bf16)
 - `magos-k8s-0.6b-f16.gguf` — GGUF, full precision (1.2 GB)
 - `magos-k8s-0.6b-q8_0.gguf` — GGUF, 8-bit quantization (610 MB)
 - `magos-k8s-0.6b-q4_k_m.gguf` — GGUF, 4-bit quantization (379 MB)
 - `tokenizer.json`, `tokenizer_config.json` — Qwen3 tokenizer
 - `chat_template.jinja` — Qwen3 ChatML template
 - `config.json`, `generation_config.json` — standard HF configs (with magos sampling defaults)
 ## Limitations and intended use
 This is a small experimental model. Always verify any command, YAML, or
 behavioral claim against current Kubernetes documentation before running in
 production. It is intended for learning, prototyping, and as a component in
 local devops agents — not as an authoritative source.
 ## License
 Apache 2.0. Inherits from the Qwen3-0.6B base model license. The training data
 is derived from the official Kubernetes documentation (CC-BY 4.0) and the
 prometheus-operator Prometheus runbooks (Apache 2.0).
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,89 @@
 {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
 {%- endfor %}
 {%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '<|im_start|>' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- endif %}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,63 @@
 {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": null,
  "dtype": "bfloat16",
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 40960,
  "max_window_layers": 28,
  "model_type": "qwen3",
  "num_attention_heads": 16,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-06,
  "rope_parameters": {
    "rope_theta": 1000000,
    "rope_type": "default"
  },
  "sliding_window": null,
  "tie_word_embeddings": true,
  "transformers_version": "5.9.0",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 151936
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,12 @@
 {
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95,
  "transformers_version": "5.9.0"
 }
--- a/magos-k8s-0.6b-f16.gguf
+++ b/magos-k8s-0.6b-f16.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5d00d2589ffa29fc6e0fd4bfefd070f54c3555140bcb5f1cefb3eb3269b12ffb
 size 1198182048
--- a/magos-k8s-0.6b-q4_k_m.gguf
+++ b/magos-k8s-0.6b-q4_k_m.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:bae88ab1824619a01dc3739754881e3245a2d5ab06a3eb5c699c232fb316a8db
 size 396704416
--- a/magos-k8s-0.6b-q8_0.gguf
+++ b/magos-k8s-0.6b-q8_0.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a5ae2c010edb9fb2478e0aab3217424df05ebe160d0564f2bfaf47dbaa969d50
 size 639446688
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:edf2a71c96c438504458ef887f3aeda7d9992984a5832e8eb162a21c08a73ee0
 size 1192135096
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
 size 11422650
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,30 @@
 {
  "add_prefix_space": false,
  "backend": "tokenizers",
  "bos_token": null,
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "extra_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "is_local": true,
  "local_files_only": false,
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }