This model is part of an execution-grounded SQL debugging workflow built on OpenEnv tasks. The key idea is to optimize for runtime correctness rather than only text-level plausibility.
The training/evaluation workflow uses:
A fast bridge phase on Qwen2.5-Coder-0.5B-Instruct for environment wiring checks.
Baseline/eval track with Qwen2.5-Coder-7B-Instruct and benchmark comparisons.
GRPO-based optimization signals from SQL execution outcomes, grader feedback, and task completion behavior.
SQL correctness can still degrade under unseen schemas/dialects.
Benchmark-style gains do not guarantee equivalent production reliability.
Model outputs should be reviewed before executing in sensitive environments.
Recommendations
Keep SQL execution sandboxed during evaluation.
Use schema introspection + error inspection loops.
Add reviewer/guardrail checks for risky query classes.
Track run artifacts and compare against deterministic graders, not only manual inspection.
How to Get Started
fromtransformersimportAutoTokenizer,AutoModelForCausalLMmodel_id="md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2"tokenizer=AutoTokenizer.from_pretrained(model_id)model=AutoModelForCausalLM.from_pretrained(model_id)prompt="Fix this SQL query based on schema and error context: SELECT * FROM userss;"inputs=tokenizer(prompt,return_tensors="pt")outputs=model.generate(**inputs,max_new_tokens=128)print(tokenizer.decode(outputs[0],skip_special_tokens=True))
Environmental Impact
This model was trained/evaluated across iterative cloud/local workflows. Exact carbon accounting is not yet logged in this card.
Citation
If you use this work, cite the project repository and model page: