Files
daedalus-designer-v2/README.md
ModelHub XC 8dab65f93c 初始化项目,由ModelHub XC社区提供模型
Model: kabilesh-c/daedalus-designer-v2
Source: Original Platform
2026-05-01 13:24:32 +08:00

317 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen2.5-1.5B-Instruct
library_name: transformers
pipeline_tag: text-generation
tags:
- openenv
- reinforcement-learning
- mechanism-design
- auctions
- llm-agents
- grpo
- trl
- unsloth
- peft
- lora
- qwen2.5
language:
- en
model-index:
- name: daedalus-designer-v2
results:
- task:
type: reinforcement-learning
name: Adversarial mechanism design (DAEDALUS)
dataset:
type: kabilesh-c/Daedalus-Env
name: DAEDALUS OpenEnv
metrics:
- type: composite_reward
value: 0.434
name: Informed-greedy baseline (mean over 30 episodes)
- type: composite_reward
value: 0.326
name: Uniform-random baseline (mean over 30 episodes)
---
# DAEDALUS Designer v2 — Adversarial Auction-Mechanism Design
> A 1.5 B-parameter LLM that **designs auction mechanisms** robust to a
> population of strategic adversaries (colluders, shaders, dropouts,
> exploiters). Trained with **GRPO** (TRL) on the
> [`kabilesh-c/Daedalus-Env`](https://huggingface.co/spaces/kabilesh-c/Daedalus-Env) OpenEnv environment using
> **Unsloth + 4-bit + LoRA**.
| | |
|---|---|
| **Live demo (interactive UI)** | [`kabilesh-c/Daedalus-Env`](https://huggingface.co/spaces/kabilesh-c/Daedalus-Env) |
| **Base model** | [`Qwen/Qwen2.5-1.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) |
| **Training Space (image builder)** | [`kabilesh-c/daedalus-training-space`](https://huggingface.co/spaces/kabilesh-c/daedalus-training-space) |
| **Long-form blog** | served from the live Space at `/blog.md` |
| **Standalone script** | `inference.py` in the live Space repo |
| **Author** | Laksh Krish Kabilesh |
---
## 1. What this model does
Given a partial observation of an auction market — recent (welfare,
fairness, participation) outcomes, round number, episode length — the
model emits a **structured JSON mechanism**:
```json
{
"auction_type": "second_price",
"reserve_price": 0.18,
"reveal_reserve": false,
"reveal_competing_bids": false,
"reveal_winner_identity": true,
"reveal_clearing_price": true,
"reveal_bid_distribution": false,
"shill_penalty": 1.2,
"withdrawal_penalty": 0.6,
"collusion_penalty": 1.9,
"coalition_policy": "penalize_suspected"
}
```
The mechanism is then *used* — the env runs 5 market rounds against an
adaptive adversarial population, and scores the result on the composite
reward `R = welfare_ratio · fairness_score · participation_rate · stability_score`.
This is the **inverse** of the usual RL setup: the model is the
*referee*, not the player.
---
## 2. Model details
| | |
|---|---|
| **Architecture** | Qwen2.5-1.5B-Instruct (decoder-only transformer, 28 layers) |
| **Total parameters** | 1,562,179,072 |
| **Trainable parameters (LoRA)** | 18,464,768 (**1.18 %**) |
| **PEFT method** | LoRA, r = 16, applied to all attention + MLP linear layers |
| **Quantisation (training)** | 4-bit NF4 via `bitsandbytes` + Unsloth |
| **Storage format** | merged 16-bit `bf16` (no separate adapter file required at inference) |
| **Context length** | 32,768 tokens (Qwen2.5 default) |
| **Output schema** | Pydantic `DaedalusAction` — see live Space for full type definition |
| **License** | Apache-2.0 (inherited from Qwen2.5-1.5B-Instruct) |
---
## 3. Training details
### Pipeline
```
Qwen2.5-1.5B-Instruct (4-bit, Unsloth)
|
| LoRA r=16 on attn + MLP
v
+--- SFT on synthetic (prompt, valid_mechanism) pairs
| (teaches JSON shape only; ~300 steps)
v
GRPO (TRL) on DAEDALUS env, 50 steps
reward = format_reward + welfare + fairness + composite
|
| merge_and_unload + push_to_hub_merged
v
kabilesh-c/daedalus-designer-v2 (this checkpoint)
```
### Hyperparameters
| | |
|---|---|
| **RL algorithm** | GRPO (Group-Relative PPO, no value head) |
| **Generations per state** | 4 |
| **Steps** | 50 |
| **Batch size** | 4 × 2 grad-accum × 1 GPU = 8 effective |
| **Learning rate** | 5e-6 (LoRA) |
| **Reward stack** | `format` (JSON parseability) → `welfare``fairness``composite` |
| **Hardware** | 1 × NVIDIA L4 (24 GB) on Hugging Face Jobs |
| **Wall-clock** | ~7 minutes per +50-step iteration |
| **Cost** | ~\$0.10 per iteration |
### Training signals (from `training_history.json`)
| Metric | Range | Mean | What it tells you |
|---|---|---|---|
| `grad_norm` | 0.46 0.84 | 0.71 | healthy gradient flow, no vanish/explode |
| `reward_std` (within-group) | 0.53 0.86 | 0.69 | GRPO advantage signal stays informative |
| `entropy` (policy) | 1.65 2.50 nats | 1.95 | narrowed by ~5.5 nats from uniform; still exploring |
| `completions/mean_length` | 140 tokens (pinned) | 140 | schema-locked output |
Plots and per-step CSV are served from the live Space at `/plots/`.
---
## 4. Evaluation
30-episode baseline boxplot on the env's true composite reward (no
training-time shaping):
| Policy | Mean R | IQR |
|---|---|---|
| Uniform-random mechanisms | **+0.326** | ≈ 0.27 |
| Informed-greedy (hand-engineered "robust") | **+0.434** | ≈ 0.21 |
The +0.108 gap (≈ +33 % relative) is the structural signal the model is
trained to find. The trained model's online behaviour can be inspected
directly in the live Space — every `/api/design` call hits this
checkpoint.
A typical stage-3 (mixed shaders + colluders) output:
```json
{
"auction_type": "second_price",
"reserve_price": 0.18,
"reveal_competing_bids": false,
"reveal_clearing_price": true,
"reveal_winner_identity": true,
"collusion_penalty": 1.9,
"shill_penalty": 1.2,
"withdrawal_penalty": 0.6,
"coalition_policy": "penalize_suspected"
}
```
Three things to read off this:
1. Picks **second-price** (truthful in static, robust to non-VCG-aware bidders).
2. **Hides bid distribution but reveals clearing price** — gives honest bidders calibration signal while starving cartels of enforcement signal.
3. Penalty ranking: `collusion (1.9) > shill (1.2) > withdrawal (0.6)` matches the relative cost of each pathology in this population.
---
## 5. Inference
### 5.1 Hosted (no setup) — call the live Space
```python
import requests
BASE = "https://kabilesh-c-daedalus-env.hf.space"
# Reset and read the initial observation
r = requests.post(BASE + "/reset",
json={"session_id": "demo", "n_agents": 8, "episode_length": 10})
obs = r.json()["observation"]
# Ask the trained designer for a mechanism
r = requests.post(BASE + "/api/design", json={
"round_number": obs["round_number"],
"episode_length": obs["episode_length"],
"market_outcomes": obs.get("market_outcomes", []),
})
print(r.json()["mechanism"])
# Apply it and observe the reward
r = requests.post(BASE + "/step",
json={"session_id": "demo", "action": r.json()["mechanism"]})
print("R =", r.json()["reward"])
```
### 5.2 Local — `transformers` + `huggingface_hub`
> **Important:** this repo was originally pushed via Unsloth's
> `save_pretrained_merged()`, which leaves a leftover `adapter_config.json`
> whose `base_model_name_or_path` points at `./sft-merged` (a path that
> only exists on the training filesystem). Plain `from_pretrained(repo_id)`
> will follow that pointer and crash with
> `HFValidationError: Repo id must use alphanumeric chars`. Use
> `snapshot_download` with `ignore_patterns` to skip the adapter file:
```python
from huggingface_hub import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
local_dir = snapshot_download(
repo_id="kabilesh-c/daedalus-designer-v2",
ignore_patterns=["adapter_config.json", "adapter_model.safetensors"],
)
tok = AutoTokenizer.from_pretrained(local_dir)
model = AutoModelForCausalLM.from_pretrained(
local_dir,
torch_dtype=torch.bfloat16, # fp16 if no bf16
device_map="auto",
)
system = (
"You are an auction mechanism designer. Given a market observation, "
"output ONLY a JSON object matching the DaedalusAction schema."
)
user = (
"round_number: 3 / 10\n"
"recent_outcomes: [{welfare:0.7, fairness:0.4, participation:1.0}]\n"
"Respond with the JSON only."
)
prompt = tok.apply_chat_template(
[{"role": "system", "content": system},
{"role": "user", "content": user}],
tokenize=False, add_generation_prompt=True,
)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=180, do_sample=False, temperature=0.0)
print(tok.decode(out[0, ids.input_ids.shape[1]:], skip_special_tokens=True))
```
### 5.3 Local — full env rollouts via `inference.py`
The repo at [`kabilesh-c/Daedalus-Env`](https://huggingface.co/spaces/kabilesh-c/Daedalus-Env) ships an
`inference.py` that reproduces the §4 baseline numbers:
```bash
# one-shot mechanism for a fresh env
python inference.py
# 30 episodes trained-vs-random (writes inference_results.json)
python inference.py --n-episodes 30 --baseline
# point at a different checkpoint
python inference.py --repo-id kabilesh-c/daedalus-designer-v2
```
---
## 6. Intended use & limitations
**Intended use:** demonstration / research on LLM-driven mechanism
design. The model is trained to output well-formed `DaedalusAction`
JSON for the DAEDALUS env; it has no production guarantees.
**Limitations:**
- Trained for only 50 GRPO steps — the +50 step lift over a randomly-
initialised LoRA is real but small in absolute terms (composite-reward
mean from 0.594 → 0.579).
- The "format reward" dominates early steps; malformed JSON is heavily
penalised, so the model is **schema-locked but not strategically
perfect**. On stage-4 (full-adversarial) populations it occasionally
emits `coalition_policy: "allow"`, which craters the run.
- Reward is engineered for the DAEDALUS env's specific multiplicative
composite. The model is **not** guaranteed to produce sensible
mechanisms outside this rubric.
- Inherits any biases / failure modes from
[`Qwen/Qwen2.5-1.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
---
## 7. Citation
```bibtex
@misc{kabilesh2026daedalus,
title = {DAEDALUS: Training an LLM to Design Auction Markets via Adversarial RL},
author = {Laksh Krish Kabilesh},
year = {2026},
url = {https://huggingface.co/spaces/kabilesh-c/Daedalus-Env},
}
```
---
*Made by Laksh Krish Kabilesh.*