Files
YiXin-Agentic-Qwen3-14B/README.md

177 lines
39 KiB
Markdown
Raw Permalink Normal View History

---
license: apache-2.0
base_model:
- Qwen/Qwen3-14B
pipeline_tag: text-generation
---
<div style="text-align: center;">
<h1>YiXin-Agentic-Qwen3-14B</h1>
<img src="./fig/logo.png" alt="YiXin Logo">
</div>
## Model Overview
**YiXin-Agentic-Qwen3-14B:** A High-Performance Agentic Model with Multi-Turn Interactions.
YiXin-Agentic-Qwen3-14B features powerful reasoning and agent capabilities, developed through multi-stage distillation and reinforcement training.This model outperforms models of the same scale in all aspects of capability. Moreover, it is also highly competitive in terms of agent capabilities even when compared to large models such as DeepSeek-V3.1, Kimi-K2-Instruct-0905, and Qwen3-235B-A22B-Thinking-2507.
![Evaluation Results](./fig/Metric.png)
## Training Details
Specifically, the training process encompasses the following four innovative design aspects.
- **Mulit Train Stage:** Initial stage uses process-supervised knowledge distillation to boost reasoning and generalization. Later, multi-stage reinforcement learning enhances agent decision quality and stability.
- **More Efficient Algorithm:** Adopt advanced **GRPO++ algorithm**. Its innovations simplify training, reduce hyperparameter tuning complexity, and improve optimization efficiency via reward-focused updates.
- **More Stable Training:** Remove entropy loss to lower training collapse risks. Add monitoring metrics (reasoning repetition rate, truncation rate, Channel Reward).
- **More Diverse Exploration:** Apply **FIRE Sampling Rollout** globally, combined with **Clip High**. Expands training data breadth for diverse solutions to complex/rare samples.
## Evaluation Results
YiXin-Agentic-Qwen3-14B was benchmarked against multiple models, including DeepSeek-V3.1, Kimi-K2-Instruct-0905, and Qwen3-235B-A22B-Thinking-2507, Qwen3-14B. The model has achieved significant improvements in both agentic and reasoning capabilities, including tool usage, logical reasoning, mathematics and coding:
### AGENTIC
| Metric | DeepSeek-V3.1 | Kimi-K2-Instruct-0905 | Qwen3-235B-A22B-Thinking-2507 | Qwen3-14B | YiXin-Agentic-Qwen3-14B |
|---------------|-------------|-----------------------------|------------------------------|-------------|------------------------|
| TAU1-Retail | **71.3** | 54.7 | 62.0 | 49.3 | 64.1 |
| TAU1-Airline | 49.3 | **63.2** | 51.3 | 26.7 | 38.7 |
| TAU2-Retail | **70.4** | 40.4 | 62.9 | 42.5 | 63.2 |
| TAU2-Airline | 43.0 | **50.0** | 46.0 | 32.5 | 48.0 |
| TAU2-Telecom | 41.2 | **74.1** | 53.5 | 30.5 | 59.7 |
| C3-Bench UA | 55.3 | 52.5 | 62.3 | 64.3 | **66.2** |
| C3-Bench UPTA | 61.1 | 63.8 | 64.1 | 65.8 | **68.0** |
| AutoFin | 46.9 | 38.8 | 48.1 | 33.1 | **49.5** |
| AVG | 54.8 | 54.7 | 56.3 | 43.1 | **57.2** |
### REASONING
| Metric | DeepSeek-V3.1 | Kimi-K2-Instruct-0905 | Qwen3-235B-A22B-Thinking-2507 | Qwen3-14B | YiXin-Agentic-Qwen3-14B |
|---------------|-------------|-----------------------------|------------------------------|-------------|------------------------|
| MATH-500 | 95.0 | 94.6 | **96** | 94.4 | 94.8 |
| GPQA-Diamond | **84.3** | 77.8 | 80.8 | 65.2 | 68.7 |
| AIME-24 | **90** | 80 | **90** | 76.67 | 83.3 |
| AIME-25 | 80 | 70 | **83.33** | 70 | 80 |
| LiveCodeBench v6 [24.08-25.05] | **74.8** | 53.7 | 74.5 | 56.6 | 57.3 |
| AVG | 84.8 | 75.2 | **84.9** | 72.6 | 76.8 |
## How to Run Locally
### Hugging Face's Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "YiXin-AILab/YiXin-Agentic-Qwen3-14B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "\n<instructions>\nYou are a customer service agent that helps the user according to the <policy> provided below.\nIn each turn you can either:\n- Send a message to the user.\n- Make a tool call.\nYou cannot do both at the same time.\n\nTry to be helpful and always follow the policy. Always make sure you generate valid JSON only.\n\n## During interactions, you may employ three response modes — <Direct Response, Tool Invocation, Information Request> — to effectively leverage tools in addressing user needs.\n**Action List**\n- Direct Response:\na. If available tools cannot resolve (or assist in resolving) the user's query, you must attempt to respond using your own knowledge base.\nb. When current information suffices to answer the user's question, provide a summarized response.\nc. All responses must be enclosed between <response> and </response> tags\n- Tool Invocation: \na. When the user's query requires using a specific tool and necessary invocation parameters are available\nb. Place the tool invocation details between <tool_call> and </tool_call>, specifying the tool name and parameters in JSON format.\n- Information Request: \na. Clarify the user's inquiry or obtain supplemental details through dialog\nb. the corresponding query enclosed within <response> and </response> tags。\n**Notes**\n1. Think step-by-step, analyzing the user's query, available tools, dialogue context, and other relevant information before determining the appropriate action to take. \n**Output Format** \n<think> Your thoughts and reasoning </think>\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n<tool_call>\n...\n</tool_call>\n \n<response> Your response or query </response>\n\n</instructions>\n\n<policy>\n# 汽车金融客服政策\n当前时间为2025年8月5日17:00:00北京时间。\n作为汽车金融客服您可协助用户发起融资申请、查询申请状态、处理申请修改 / 取消及异常问题。\n- 对话开始时,您必须通过姓名+身份证号的方式对用户身份进行验证。即使用户已提供用户ID也必须执行此操作。\n- 你可以通过查询用户详情信息来获取当前系统用户的详细信息比如订单ID、线索ID等信息。即使用户已提供相应的订单ID、线索ID等信息也必须执行此操作进行验证。\n- 在执行任何会更新融资数据库的操作(提交预审、绑定评估报告、提交融资信息、确认方案等)之前,必须列出操作详情并获得用户的明确确认(回复\"是\")后方可进行。\n- 在每一个用户完成的核心关键节点操作完成后,都应该进行相应的融资订单状态更新,你应该正确操作更新订单的状态。\n- 不得提供用户未提及或系统无法获取的信息、知识或流程,也不得给出主观推荐或评论。\n- 每次只能进行一次系统操作,若进行系统操作,不得同时回复用户;若回复用户,也不得同时进行系统操作。\n- 对于违反本政策的用户请求,应予以拒绝。\n- 只有当请求超出您的操作范围而无法处理时,才应将用户转接给专业坐席。\n\n## 领域基础信息\n- 每位用户都有一个客户档案包含客户ID、线索ID、订单ID、姓名、身份证号、联系方式、地址。\n- 每个线索会有线索ID线索状态。\n- 每笔融资申请都有一个订单ID、客户ID、车辆评估信息、融资方案、申请时间、订单当前状态。\n- 系统订单状态分为下面不同阶段:\n - **初审阶段**:包括待征信授权、等待鑫验证等状态\n - 待征信授权:创建订单的初始状态即为“待征信授权”,当前状态下系统应该引导用户完成征信授权。\n - 等待鑫认证:完成征信授权后,订单状态更新为“等待鑫验证”,当前状态下系统可以继续的流程是引导用户完成车辆评估。\n - 待车辆预评完成车辆评估后当前系统应该引导用户将融资订单绑定<E7BB91><E5AE9A>
{"role": "user", "content": "你好,我想申请车辆融资。之前已经创建过融资订单了,但我不记得订单号了,请问接下来需要我做什么呢?"},
{"role": "assistant", "content": "您好!我来帮您查询您的融资订单信息。根据我们的政策要求,我需要先验证您的身份信息。\n\n请您提供以下信息进行身份验证\n- 您的姓名\n- 身份证号码\n\n验证通过后我就可以帮您查询到您的融资订单信息并告知您下一步需要做什么。"},
{"role": "user", "content": "好的,我叫刘**身份证号码是231***********0653。"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": { "name": "verify_identity", "arguments": '{"name": "刘**", "id_card": "231***********0653"}'}, "id": "chatcmpl-tool-c77f66beaff647fd95f4c853ebb18589"}]},
{"role": "tool", "content": "{\"user_id\": \"1776c035-87e6-40b0-b3ea-bcdf72c74d59\"}", "tool_call_id": "chatcmpl-tool-c77f66beaff647fd95f4c853ebb18589"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "get_user_details", "arguments": '{"user_id": "1776c035-87e6-40b0-b3ea-bcdf72c74d59"}'}, "id": "chatcmpl-tool-2b229cc28f2243aab5ca0ae14b93e60a"}]},
{"role": "tool", "content": '{"personalInfo": {"UserId": "1776c035-87e6-40b0-b3ea-bcdf72c74d59", "UserName": "刘**", "idCard": "231***********0653", "phoneNumber": "152****5505", "Sex": "\\u5973", "SystemMarriageInfo": "", "SystemResidentialAddress": "", "SystemWorkInfo": "", "SystemWorkAddress": ""}, "personalContactsMes": {"ContactsFirstName": "", "ContactsFirstPhone": "", "ContactsSecondName": "", "ContactsSecondPhone": ""}, "vehicleInfo": {"car_id": "", "VehicleBrand": "", "VehicleSeries": "", "VehicleModel": "", "IsNewEnergyVehicle": "False", "VehicleSalesPrice": "", "driver_license_info": null}, "loanInfo": {"CustomerFinancingAmount": "", "MonthlySupply": "", "Instalments": "", "HadCarLoaning": "", "FinancLoanRatio": null}, "familyRelationInfo": {"SpouseName": "", "SpouseBirthDay": "", "ImmediateFamilyName": "", "ImmediateFamilyRelation": ""}, "bankAccountInfo": [{"cardNo": "", "bank": ""}], "otherInfo": {"OrderNo": ["243922736791"], "ClueNo": []}, "userIntention": ""}', "tool_call_id": "chatcmpl-tool-2b229cc28f2243aab5ca0ae14b93e60a"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "get_order_details", "arguments": '{"user_id": "1776c035-87e6-40b0-b3ea-bcdf72c74d59", "order_id": "243922736791"}'}, "id": "chatcmpl-tool-db1d7f784bca4eafa2fb3fd2a324883f"}]},
{"role": "tool", "content": '{"UserName": "刘**", "UserId": "1776c035-87e6-40b0-b3ea-bcdf72c74d59", "OrderNo": "243922736791", "OrderStatus": "待征信授权", "ProductName": "", "used_car_eval": {"need_toolcall": null, "EvalStatus": null, "EvalCompany": null, "EvalMessage": null, "ReportNumber": null}}', "tool_call_id": "chatcmpl-tool-db1d7f784bca4eafa2fb3fd2a324883f"},
{"role": "assistant", "content": "刘女士,您好!我查询到您的融资订单信息:\n\n- 订单号243922736791\n- 当前状态:待征信授权\n\n根据您的订单状态接下来您需要完成征信授权流程。这是申请车辆融资的必要步骤包括\n1. 身份验证\n2. 填写手机号获取验证码\n3. 完成人脸识别核验\n\n请您准备好手机我将协助您开始征信授权流程。您是否现在就开始进行征信授权"},
{"role": "user", "content": "是的,我现在就可以开始征信授权流程。请问具体需要我怎么做呢?"},
{"role": "assistant", "content": "好的,我来指导您完成征信授权流程。首先需要获取您的手机验证码。\n\n请您提供您的手机号码我将为您发送验证码。您的手机号是152****5505 吗?"},
{"role": "user", "content": "是的,这是我的手机号码。请发送验证码吧。"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "send_verification_code", "arguments": '{"phone_number": "152****5505"}'}, "id": "chatcmpl-tool-13a178bf661d4ac1b166774788fef241"}]},
{"role": "tool", "content": "{\"result\": \"success\"}", "tool_call_id": "chatcmpl-tool-13a178bf661d4ac1b166774788fef241"},
{"role": "assistant", "content": "验证码已成功发送到您的手机152****5505。请您查看手机短信将收到的6位验证码告诉我然后我们继续下一步的身份验证流程。"},
{"role": "user", "content": "我收到的验证码是726398。"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "request_credit_authorization", "arguments": '{"phone_number": "152****5505", "verification_code": "726398"}'}, "id": "chatcmpl-tool-f95e4956a3fc465ba1585ad94786acff"}]},
{"role": "tool", "content": "{\"authorization_link\": \"https://api-sim.face-recog-demo.com/v1/face/detect?userId=1776c035-87e6-40b0-b3ea-bcdf72c74d59&code=726398\"}", "tool_call_id": "chatcmpl-tool-f95e4956a3fc465ba1585ad94786acff"},
{"role": "assistant", "content": "验证码验证成功!现在需要进行人脸识别验证。我已经为您生成了人脸识别链接:\n\n**请点击以下链接完成人脸识别:**\nhttps://api-sim.face-recog-demo.com/v1/face/detect?userId=1776c035-87e6-40b0-b3ea-bcdf72c74d59&code=726398\n\n请注意\n1. 此链接只能使用一次,请确保在良好的光线环境下进行\n2. 请按照页面提示完成人脸识别操作\n3. 完成后请告诉我,我将为您验证识别结果\n\n请您现在点击链接完成人脸识别验证。"},
{"role": "user", "content": "我刚才点击了链接,但是提示链接失效了。能不能麻烦您重新发送一个人脸识别链接给我?"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "send_verification_code", "arguments": '{"phone_number": "152****5505"}'}, "id": "chatcmpl-tool-e71b49417b0c4b0aa363ee625b1436dd"}]},
{"role": "tool", "content": "{\"result\": \"success\"}", "tool_call_id": "chatcmpl-tool-e71b49417b0c4b0aa363ee625b1436dd"},
{"role": "assistant", "content": "新的验证码已发送到您的手机152****5505。请您查看短信将新的6位验证码告诉我然后我会为您生成新的人脸识别链接。"},
{"role": "user", "content": "我收到了新的验证码是654322。"},
{"role": "assistant", "content": "", "tool_calls": [{"type": "function", "function": {"name": "request_credit_authorization", "arguments": '{"phone_number": "152****5505", "verification_code": "654322"}'}, "id": "chatcmpl-tool-2bc67901917d482ba497793b58d189e0"}]},
{"role": "tool", "content": "{\"authorization_link\": \"https://api-sim.face-recog-demo.com/v1/face/detect?userId=1776c035-87e6-40b0-b3ea-bcdf72c74d59&code=654322\"}", "tool_call_id": "chatcmpl-tool-2bc67901917d482ba497793b58d189e0"},
{"role": "assistant", "content": "好的,新的验证码验证成功!我已经为您生成了新的人脸识别链接:\n\n**请点击以下链接完成人脸识别:**\nhttps://api-sim.face-recog-demo.com/v1/face/detect?userId=1776c035-87e6-40b0-bcdf72c74d59&code=654322\n\n**重要提醒:**\n- 此链接为最新生成的有效链接\n- 请在收到后立即点击,避免再次失效\n- 确保在光线充足的环境下进行人脸识别\n- 按照页面提示完成操作\n\n请您现在点击这个新链接完成人脸识别验证完成后请告诉我。"},
{"role": "user", "content": "好的,我这就去完成人脸识别。"},
{"role": "assistant", "content": "好的,请您现在点击链接完成人脸识别。我会在这里等待您的确认。\n\n完成人脸识别后请告诉我一下我将为您验证识别结果并更新订单状态。"},
{"role": "user", "content": "我刚才点击链接完成了人脸识别,现在应该可以了吧?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=8192
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
**Note:** Since the examples involve sensitive company business information, the given prompts are simplified samples based on real scenarios from Yixin Auto Finance.
### vLLM or SGLang
For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm):
```shell
vllm serve YiXin-AILab/YiXin-Agentic-Qwen3-14B --tensor-parallel-size 1 --max-model-len 32768 --enforce-eager
```
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
```bash
python3 -m sglang.launch_server --model YiXin-AILab/YiXin-Agentic-Qwen3-14B --trust-remote-code --tp 1 --port 8000
```
Then you can access the Chat API by:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "YiXin-AILab/YiXin-Agentic-Qwen3-14B",
"messages": [
{"role": "system", "content": "You are a helpful and harmless assistant."},
{"role": "user", "content": "按揭买的车还能继续融资吗?"}
]
}'
```
## Citation
If you find our work helpful, feel free to give us a cite.
```bibtex
@misc{yixinagenticqwen314b,
title={YiXin-Agentic-Qwen3-14B: A High-Performance Agentic Model with Multi-Turn Interactions},
author={YiXin-AILab},
year={2025},
url={https://huggingface.co/YiXin-AILab/YiXin-Agentic-Qwen3-14B}
}
```