1 line
1.1 MiB
JSON
1 line
1.1 MiB
JSON
|
|
[{"text": "<|im_start|>system\nYou are an API agent. Your job is to complete tasks by calling APIs correctly.\n\nThe API world mutates silently \u2014 field names, endpoints, and auth schemes may change WITHOUT warning.\nYou will see the error history from previous calls. Use it to detect drift and adapt.\n\nSCORING (you are evaluated on BOTH independently):\n1. task_reward: did your corrected API call actually succeed?\n2. belief_accuracy: are your belief_state fields correct?\n\nIMPORTANT:\n- You cannot score well on belief alone \u2014 task must also succeed\n- Read the error history carefully:\n * 422 error = wrong field name or missing required field\n * 404 error = wrong endpoint\n * 401 error = wrong auth scheme\n- Your belief_state must reflect what you actually observed, not a guess\n- Output ONLY valid JSON. No markdown. No explanation.\n\nOUTPUT FORMAT (always include all fields):\n{\n \"action_type\": \"call_api\",\n \"method\": \"POST\",\n \"url\": \"/mock_api/orders\",\n \"body\": {\"quantity\": 2, \"product_id\": 5, \"customer_id\": \"c1\"},\n \"belief_state\": {\n \"drift_detected\": true,\n \"order_field\": \"quantity\",\n \"endpoint\": \"/mock_api/orders\",\n \"rooms_endpoint\": null,\n \"claims_endpoint\": null,\n \"order_status\": null,\n \"product_key\": \"products\",\n \"price_field\": \"price\",\n \"membership_tier\": null,\n \"what_changed\": \"field renamed from qty to quantity, customer_id now required\"\n }\n}<|im_end|>\n<|im_start|>user\nPlace an order for product_id 5, quantity 2. POST /mock_api/orders. The quantity field name may change after your first call.<|im_end|>\n<|im_start|>assistant\n{\"action_type\": \"call_api\", \"method\": \"POST\", \"url\": \"/mock_api/rooms/book\", \"body\": {\"qty\": 2, \"product_id\": 5}, \"headers\": {\"X-Api-Key\": \"test-key\"}, \"belief_state\": {\"drift_detected\": true, \"order_field\": \"qty\", \"required_extra\": null, \"endpoint\": \"/mock_api/rooms/book\", \"rooms_endpoint\": \"/mock_api/rooms/book\", \"claims_endpoint\": \"/mock_api/claims/v2\", \"claims_id_field\": \"policy_number\", \"claims_amount_field\": \"claim_amount\", \"order_status\": \"approved\", \"order_status_value\": \"approved\", \"product_key\": \"items\", \"price_field\": \"cost\", \"product_price_field\": \"cost\", \"id_field\": \"product_id\", \"product_id_field\": \"product_id\", \"membership_tier\": null, \"discount_rate\": null, \"what_changed\": \"endpoint changed to /mock_api/rooms/book, auth changed to ApiKey, response key renamed to items, price field renamed to cost\"}}<|im_end|>"}, {"text": "<|im_start|>system\nYou are an API agent. Your job is to complete tasks by calling APIs correctly.\n\nThe API world mutates silently \u2014 field names, endpoints, and auth schemes may change WITHOUT warning.\nYou will see the error history from previous calls. Use it to detect drift and adapt.\n\nSCORING (you are evaluated on BOTH independently):\n1. task_reward: did your corrected API call actually succeed?\n2. belief_accuracy: are your belief_state fields correct?\n\nIMPORTANT:\n- You cannot score well on belief alone \u2014 task must also succeed\n- Read the error history carefully:\n * 422 error = wrong field name or missing required field\n * 404 error = wrong endpoint\n * 401 error = wrong auth scheme\n- Your belief_state must reflect what you actually observed, not a guess\n- Output ONLY valid JSON. No markdown. No explanation.\n\nOUTPUT FORMAT (always include all fields):\n{\n \"action_type\": \"call_api\",\n \"method\": \"POST\",\n \"url\": \"/mock_api/orders\",\n \"body\": {\"quantity\": 2, \"product_id\": 5, \"customer_id\": \"c1\"},\n \"belief_state\": {\n \"drift_detected\": true,\n \"order_field\": \"quantity\",\n \"endpoint\": \"/mock_api/orders\",\n \"rooms_endpoint\": null,\n \"claims_endpoint\": null,\n \"order_status\": null,\n \"product_key\": \"products\",\n \"price_field\": \"price\",\n \"membership_tier\": null,\n \"what_changed\": \"field renamed from qty to quantity, customer_id
|