Go to file

ModelHub XC 766501c44d 初始化项目，由ModelHub XC社区提供模型

Model: joynnayvedya/disaster-response-trained
Source: Original Platform

2026-05-02 13:51:25 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

adapter_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

adapter_model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 13:51:25 +08:00

README.md

base_model, tags, license, language

base_model

The Problem

During a natural disaster, Emergency Operations Centers (EOCs) are overwhelmed by thousands of frantic incident reports. A flooded neighborhood, a chemical plant fire, a hospital wing collapse — all arriving simultaneously. Human coordinators must instantly decide: which team? what priority? what action?

We built an AI agent that does exactly this.

The Environment

We built Disaster Response Coordination OpenEnv — a multi-step RL environment where an AI agent acts as an Emergency Incident Commander.

15 real-world scenarios across 3 difficulty tiers, modeled after actual disasters:

🌊 2018 Kerala Floods → dam spillway overflow, communication blackouts
☠️ 2020 Vizag Gas Leak → chemical plant fire, toxic plume evacuation
⚡ 2012 North India Grid Failure → cold-chain medicine failures, hospital blackouts

Action Space

For every incident ticket, the agent must complete a 4-step workflow: classify → set_priority → draft_reply → submit_ticket

Reward Function

reward = 0.40 × team_routing + 0.30 × priority + 0.30 × reply_quality

Dense, partial rewards at every step. No sparse end-of-episode signals.

Difficulty Scaling

Tier	Budget	Scenarios
🟢 Easy	40	Single-team, clear incidents
🟡 Medium	48	Multi-agency, ambiguous
🔴 Hard	55	Cascading mass-casualty + time pressure

Training with GRPO

We trained Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) via TRL + Unsloth on a Google Colab T4 GPU.

Setup:

Base model: unsloth/Qwen2.5-7B-Instruct-bnb-4bit
Algorithm: GRPOTrainer (TRL)
LoRA: r=16, 4-bit quantization
Epochs: 3 | Steps: 14
Reward: Live environment feedback via HF Space API

The reward function connected directly to our live HF Space — every training step sent real incident prompts to the environment and received real rewards back.

Training Reward Curve

What We Discovered: Sparse Reward Collapse

The untrained base model immediately revealed why this environment is hard:

Before training, the model hallucinated invalid outputs: team: "emergency_services" ❌ (not a valid team) team: "utility repair" ❌ priority: "very-high" ❌ (not a valid priority) priority: "higher" ❌

After training, the model learned valid action spaces: team: "rescue" ✅ priority: "urgent" ✅

However, we observed sparse reward collapse — a known RL failure mode where a small model (7B at 4-bit) struggles to optimize across a multi-step workflow with interdependent rewards. This validates our environment's quality: it is genuinely difficult enough to expose real RL failure modes that larger models or longer training runs would be needed to overcome.

Baseline Results

Agent	Easy	Medium	Hard	Avg
Heuristic Baseline	0.704	0.683	0.660	0.682
GRPO Qwen2.5-7B	—	—	—	research ongoing

All 3 difficulty tiers passed (score ≥ 0.6).

The Dashboard

We built a military-style tactical command dashboard with:

🗺️ Live OpenStreetMap incident markers with radar pulse animations
⚡ ARIA — AI Incident Analyst (Gemini-powered, analyses any incident live)
📊 Real-time score tracking, threat level bar, team routing
🔔 Operations feed with meaningful event notifications

Links

Resource	URL
🚀 HF Space (Live Environment)	joynnayvedya/disaster-response-openenv
🧠 Trained Model	joynnayvedya/disaster-response-trained
💻 GitHub	letsjoyn/meta-scalar-hack

Try It Yourself

git clone https://github.com/letsjoyn/meta-scalar-hack.git
cd meta-scalar-hack
pip install -e .
py inference.py

Built for the 2026 Meta & Scalar AI Hackathon — Grand Finale, Bangalore.

Uploaded finetuned model

Developed by: joynnayvedya
License: apache-2.0
Finetuned from model : unsloth/Qwen2.5-7B-Instruct-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

README.md Unescape Escape

The Problem

The Environment

Action Space

Reward Function

Difficulty Scaling

Training with GRPO

Training Reward Curve

What We Discovered: Sparse Reward Collapse

Baseline Results

The Dashboard

Links

Try It Yourself

Uploaded finetuned model

README.md