Go to file

ModelHub XC 02da0760c5 初始化项目，由ModelHub XC社区提供模型

Model: clglavan/magos-k8s-0.6b
Source: Original Platform

2026-05-31 16:57:17 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

magos-k8s-0.6b-f16.gguf

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

magos-k8s-0.6b-q4_k_m.gguf

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

magos-k8s-0.6b-q8_0.gguf

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-31 16:57:17 +08:00

README.md

license, language, base_model, library_name, pipeline_tag, tags

license

language

base_model

library_name

pipeline_tag

magos-k8s-0.6b

A small (0.6B parameter) Kubernetes debugging assistant, fine-tuned from Qwen3-0.6B on Kubernetes documentation, the full Kubernetes API reference (every resource Kind), the kubectl command reference, and Prometheus alert runbooks.

Purpose

magos exists to nudge any input — vague, off-topic, or underspecified — into a debugging mindset: identify the missing information, propose the kubectl/promtool/log-inspection step that would resolve the ambiguity, and respond with a concrete next action instead of speculation. This bias toward "what would I check next?" is trained in by design, not an emergent property.

That bias is what makes magos ideal for autonomous devops agents running in a planner → executor loop. The model reliably emits the next executable move (a kubectl invocation, a YAML patch, a metric to scrape) rather than prose, which is what an agent needs to make progress on its own. It runs locally at ~600 MB (Q8) and is fast enough to be the inner-loop reasoner of an agent that also carries longer-horizon planning elsewhere.

Pair with RAG for best results

magos is intentionally small and ships frozen knowledge from the documentation snapshot it was trained on. Pairing it with a retrieval layer (your live cluster's kubectl explain output, your team's runbooks, your fleet's current CRD schemas, recent incident postmortems) lifts answer quality substantially while keeping the model itself tiny — you get the debugging-mindset reflex from magos plus authoritative, current facts from your retriever, instead of having to grow the model to memorise everything. A typical setup: retrieve 2–5 short snippets relevant to the user's symptom and prepend them to the prompt; magos will weave them into its next-step-first response.

What's new in v8 (vs v7)

	v7	v8
Stage 2 training examples	~6,100	~6,740 (+10%)
YAML bucket	780 unfiltered	521 schema-filtered — every example's apiVersion+fields validated against the K8s v1.34 OpenAPI spec; ~33% invented-field examples dropped before training
Anti-hallucination contrast bucket	none	~317 new examples teaching wrong-vs-right pairs for kubectl flags, YAML field names, and diagnosis patterns mined from v7's actual failures
General-instruct mix	none	~600 Alpaca examples (~9%) blended in to defend against catastrophic forgetting of base reasoning
Stage 2 LR / epochs	1.5e-5 / 2 epochs	1.5e-5 / 2 epochs (unchanged — proven recipe)
Stage 2 eval_loss	1.667	1.716 (slightly higher — expected, since 9% of examples are out-of-K8s-distribution Alpaca)

Why these changes

v7's main weaknesses surfaced in agent-usability review:

Specific flag/field hallucinations: --show-namespace, --limit on kubectl logs, volumeAccessModes, autoscaling/v2beta3. We mined the actual hallucinations v7 produced across 75 benchmark verdicts (817 occurrences) and built targeted contrast pairs — for each known wrong pattern, a paired Q&A that explicitly contrasts it with the correct one.
YAML schema invention: v7's YAML bucket was not validated post-synth. v8 runs each example through the v1.34 OpenAPI lookup and drops any example with >2 invented field paths.
General-reasoning regression: v7 lost 3 points on the general bucket vs v6. v8 mixes in a small Alpaca slice so non-K8s prompts stay sharp.

Benchmarks (3-judge consensus, anonymized review of v6 vs v7 vs v8 across 25 prompts)

Each of 25 prompts was evaluated by 3 independent reviewers who saw the responses anonymized as A/B/C with the rubric for that prompt. Reviewers were forced to produce explicit reasoning, list verified facts and hallucinations, and rate agent_usable before assigning a 1-5 score. Final per-prompt score is the median of the 3 reviewers' scores.

Bucket	Max	v6	v7	v8
kubectl/CLI accuracy	30	8	10	14 (+4)
YAML manifest validity	25	6	11	12 (+1)
Debugging diagnose	30	9	10	8 (-2)
Prometheus runbook	25	7	7	6 (-1)
General reasoning	15	14	12	15 (+3)
Total	125	44 (35%)	50 (40%)	55 (44%)

Headline: v8 takes the largest single-version jump yet in kubectl accuracy (+4 points on a 30-point bucket) and recovers full general-reasoning performance, at a small cost in Diagnose and Runbook accuracy (-2 and -1). The Alpaca mix successfully defended against forgetting; the contrast bucket visibly suppressed the specific hallucinated flags v7 was repeating.

Honest absolute level: even v8 scores 44% on this benchmark. The judges grade strictly for agent-usability — a single invented flag or wrong apiVersion is enough to mark a response as not-executable. v8 is the best version of magos yet, but there is substantial room to grow toward 100%.

To pin a specific version when loading:

AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b", revision="v8")
# or revision="v7" / "v6" / "v5" / "v3" / "v2" for previous versions

What it's good at

kubectl command construction — v8's strongest area. Real flags, correct flag forms, no --show-namespace/--limit-on-logs style inventions seen in v7.
YAML manifest generation — Pod, Deployment, Service, NetworkPolicy, PVC, HPA, ConfigMap, Secret, RBAC and ~70 other top-level Kinds all have correct apiVersion and field names (schema-validated training set).
Diagnosing pasted errors — kubectl describe output, log lines, alert payloads → root cause + next-step suggestions
Prometheus alert handling — meaning + diagnostic steps for the prometheus-operator runbook set (KubePodCrashLooping, etcdBackendQuotaLowSpace, AlertmanagerClusterDown, etc.)
Agent-style outputs — short, command-first responses suitable for autonomous execution rather than human reading
Basic general reasoning — Alpaca mix preserves math, generic CS facts, short explanations

What it's not good at

Multi-step planning or complex tool chains — it's a 0.6B model
Subtle/rare flags — common flags are reliable; rare-but-real flags are still sometimes hallucinated. Always sanity-check with kubectl --help.
Multi-flag combinations on the same command — accuracy drops as flag count goes up
Knowledge of features released after the source docs were captured (mid-2026)
Long-form thinking — SFT suppressed Qwen3's <think> behavior

How to use

llama.cpp / Ollama / LM Studio

Three GGUF quantization levels are included — pick one:

File	Size	Quality
`magos-k8s-0.6b-f16.gguf`	1.2 GB	reference (full bf16 precision)
`magos-k8s-0.6b-q8_0.gguf`	610 MB	effectively identical to f16, half the size — recommended
`magos-k8s-0.6b-q4_k_m.gguf`	379 MB	smallest. Some quality loss — `kubectl` flag/argument mistakes appear more often than with q8/f16. Fine for casual use, not recommended for accuracy-critical tasks.

Example with llama-cpp-python:

from llama_cpp import Llama

llm = Llama(model_path="magos-k8s-0.6b-q8_0.gguf", n_ctx=4096, chat_format="chatml")
resp = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Drain node worker-3 ignoring DaemonSets and deleting local-storage pods."}],
    temperature=0.05,
    repeat_penalty=1.15,
    max_tokens=512,
)
print(resp["choices"][0]["message"]["content"])

The temperature=0.05 and repeat_penalty=1.15 defaults are important — 0.6B models loop on longer structured outputs without a repetition penalty.

Hugging Face transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tok   = AutoTokenizer.from_pretrained("clglavan/magos-k8s-0.6b")
model = AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b",
                                             dtype="bfloat16",
                                             device_map="auto")

messages = [{"role": "user", "content": "Give me a NetworkPolicy that denies all egress from app pods except DNS."}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=384,
                     do_sample=True, temperature=0.05,
                     top_p=0.95, top_k=20, repetition_penalty=1.15)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training


Base model	Qwen/Qwen3-0.6B
Method	Two stage: continued pre-training (CPT) → supervised fine-tuning (SFT). Both full-weight (no LoRA).
Stage 1 corpus	~8.5k document chunks: kubernetes.io docs + blog (~6.5k), Kubernetes API reference v1.34 (~1.9k), Prometheus alert runbooks (~106). Reused from v5/v6/v7 — corpus unchanged.
Stage 1 tokens	~6.5M
Stage 1 LR	5e-6, cosine, 3% warmup, 1 epoch
Stage 2 corpus (v8)	~6,740 synthetic Q&A pairs. Distribution: K8s debugging (~1.7k), K8s API field/schema (~1.3k), Prometheus runbook (~1.0k, 10 examples per runbook), kubectl reference (~1.3k, 15 per subcommand), schema-filtered YAML bucket (~520), anti-hallucination contrast bucket (~317), general-instruct mix (~600)
Stage 2 LR	1.5e-5, cosine, 3% warmup, 2 epochs
Micro batch / grad accum	1 / 16 (effective batch 16)
Precision	bfloat16
Sequence length	2048
Stage 1 eval_loss	1.71
Stage 2 eval_loss	1.72 (v7 was 1.67; the small regression reflects the 9% Alpaca slice being out-of-K8s-distribution — judge benchmark is the real measure)

Files

model.safetensors — fine-tuned weights, HF format (1.2 GB, bf16)
magos-k8s-0.6b-f16.gguf — GGUF, full precision (1.2 GB)
magos-k8s-0.6b-q8_0.gguf — GGUF, 8-bit quantization (610 MB)
magos-k8s-0.6b-q4_k_m.gguf — GGUF, 4-bit quantization (379 MB)
tokenizer.json, tokenizer_config.json — Qwen3 tokenizer
chat_template.jinja — Qwen3 ChatML template
config.json, generation_config.json — standard HF configs (with magos sampling defaults)

Limitations and intended use

This is a small experimental model. Always verify any command, YAML, or behavioral claim against current Kubernetes documentation before running in production. It is intended for learning, prototyping, and as a component in local devops agents — not as an authoritative source.

License

Apache 2.0. Inherits from the Qwen3-0.6B base model license. The training data is derived from the official Kubernetes documentation (CC-BY 4.0) and the prometheus-operator Prometheus runbooks (Apache 2.0).

README.md Unescape Escape

magos-k8s-0.6b

Purpose

Pair with RAG for best results

What's new in v8 (vs v7)

Why these changes

Benchmarks (3-judge consensus, anonymized review of v6 vs v7 vs v8 across 25 prompts)

What it's good at

What it's not good at

How to use

llama.cpp / Ollama / LM Studio

Hugging Face transformers

Training

Files

Limitations and intended use

License

README.md