初始化项目，由ModelHub XC社区提供模型

Model: dataslab/DLM-NL2JSON-4B Source: Original Platform
2026-05-04 04:44:49 +08:00
commit 72ae6ed524
17 changed files with 307179 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,224 @@
 ---
 language:
  - ko
 license: apache-2.0
 tags:
  - task-specific
  - structured-prediction
  - korean
  - public-sector
  - qwen3
  - domain-specific
  - merge
 base_model: Qwen/Qwen3-4B
 datasets: []
 pipeline_tag: text-generation
 model-index:
  - name: DLM-NL2JSON-4B
    results:
      - task:
          type: structured-prediction
          name: Korean NL-to-JSON Schema Extraction
        dataset:
          type: custom
          name: Busan Public Data Query Test Set
          args:
            num_samples: 2041
        metrics:
          - type: exact_match
            value: 94.4
            name: Exact Match Accuracy (raw)
          - type: exact_match
            value: 96.8
            name: Exact Match Accuracy (adjusted)
 ---
 # DLM-NL2JSON-4B
 **A 4B-parameter service-specific LLM that outperforms GPT-4o (+14%p) and Qwen3.5-35B (+22%p) on structured JSON extraction from Korean natural language queries.**
 DLM (Domain-specific Language Model) is a series of task-specialized models by [Data Science Lab., Ltd.](https://huggingface.co/dataslab). This model is a LoRA-merged Qwen3-4B fine-tuned for structured JSON extraction in the Busan Metropolitan City public data analytics service.
 ## Key Results
 Evaluated on 2,041 test samples across 10 task categories (field-level exact match, summary excluded):
 | Model | Params | Accuracy | Accuracy (adj*) | Avg Latency |
 |-------|--------|----------|-----------------|-------------|
 | **DLM-NL2JSON-4B** | **4B** | **94.4%** | **96.8%** | 2.59s |
 | GPT-4o | ~200B+ | 80.5% | 82.5% | 1.58s |
 | Qwen3.5-35B-A3B | 35B | 72.2% | 73.9% | 0.85s |
 *\*adj: 64 CSM samples with known gold label noise excluded (see Evaluation section)*
 ### Per-Category Breakdown
 | Category | N | DLM-NL2JSON-4B | GPT-4o | Qwen3.5-35B |
 |----------|---|-------------|--------|-------------|
 | ALP-A (population pattern) | 250 | **99.6%** | 56.0% | 47.6% |
 | ALP-B (population flow) | 250 | **98.4%** | 50.4% | 46.8% |
 | CSM (consumer spending) | 700 | **90.6%** | 90.1% | 86.1% |
 | CREDIT-Income | 58 | **94.8%** | 53.4% | 34.5% |
 | CREDIT-Spending | 77 | **97.4%** | 92.2% | 51.9% |
 | CREDIT-Loan/Default | 73 | **98.6%** | 94.5% | 72.6% |
 | CPI (business status) | 219 | 86.3% | **87.2%** | 54.8% |
 | GIS-Inflow | 72 | **97.2%** | 79.2% | 93.1% |
 | GIS-Outflow | 62 | **98.4%** | 77.4% | 98.4% |
 | GIS-Consumption | 280 | 98.2% | **99.6%** | 97.5% |
 DLM-NL2JSON-4B wins **8 out of 10 categories**, with the largest gains on ALP (+43%p vs GPT-4o) and CREDIT-Income (+41%p).
 ## Important: This is a Service-Specific Model
 > **This model is NOT a general-purpose NL-to-JSON converter.** It is trained exclusively for a fixed set of predefined schemas used in a specific production service. It will not generalize to arbitrary JSON schemas or different prompt formats.
 To use this model correctly, you **must**:
 1. Use the **exact system prompts** it was trained on (one per task category — see Usage section)
 2. Include the corresponding **special token** (`<TASK_CSM>`, `<TASK_CREDIT>`, `<TASK_GIS>`, `<TASK_ALP>`, `<TASK_CPI>`) in the input
 3. Expect output conforming only to the **predefined schemas** listed below
 **Why publish a service-specific model?** This model serves as a reference implementation demonstrating that **task-specific LoRA fine-tuning on a 4B model can dramatically outperform GPT-4o and larger open-source models** on constrained structured output tasks. We believe the DLM (Domain-specific Language Model) approach — training small, cheap-to-serve models for specific service endpoints — is an underexplored but highly practical paradigm.
 ## Intended Use
 This model converts **Korean natural language queries about public/economic data** into **structured JSON** conforming to its predefined schemas. It is designed for and deployed in the **Busan Metropolitan City Big Data Wave** analytics dashboard.
 **Input**: Free-form Korean query + task-specific system prompt
 **Output**: Single-line JSON with exact schema compliance:
 ```json
 {"summary":"##2025년 5월 부산광역시 해운대구 유통/의료 소비분석##","base_ym":202505,"region_nm":"부산광역시 해운대구","industry_select":{"3":[],"8":[]},"sex_cd":[1],"age_cd":[30],"category":2}
 ```
 ### Task Categories
 | ID | Name | Schema Type |
 |----|------|-------------|
 | 0 | ALP-A | Population pattern (ptrn: residence/work/visit) |
 | 1 | ALP-B | Population flow (flow_cd: inflow/outflow) |
 | 2 | CSM | Consumer spending by industry |
 | 3 | CREDIT-Income | Income statistics |
 | 4 | CREDIT-Spending | Spending statistics |
 | 5 | CREDIT-Loan | Loan/default statistics |
 | 6 | CPI | Business/enterprise status |
 | 9 | GIS-Inflow | Geographic inflow analysis |
 | 10 | GIS-Outflow | Geographic outflow analysis |
 | 11 | GIS-Consumption | Geographic consumption analysis |
 ## Training Details
 | Item | Value |
 |------|-------|
 | Base model | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) |
 | Method | LoRA SFT → merged full model |
 | Training samples | 16,292 (Korean) |
 | Validation samples | 2,034 |
 | Special tokens | `<TASK_CSM>`, `<TASK_CREDIT>`, `<TASK_GIS>`, `<TASK_ALP>`, `<TASK_CPI>` |
 | Max sequence length | 6,144 |
 | Architecture | Qwen3ForCausalLM (36 layers, 2560 hidden, 32 heads) |
 Training data consists of synthetically generated Korean natural language queries paired with structured JSON outputs, covering the Busan public data analytics domain.
 ## Evaluation Methodology
 - **Metric**: Field-level exact match — each JSON key's value is compared against the gold label. The `summary` field is excluded from comparison.
 - **Test set**: 2,041 samples, stratified by category
 - **Gold label noise**: 64/700 CSM samples have `age_cd` capped at `[10..60]` instead of `[10..70]` for "all ages" queries, conflicting with the prompt specification. These affect all models equally and are excluded in the adjusted metric.
 - **Train/Test overlap**: 16/2,041 input strings (0.78%) appear in both sets — retained for consistency.
 - **All models** received identical system prompts per category.
 ### Hardware
 | Model | Serving | GPU |
 |-------|---------|-----|
 | DLM-NL2JSON-4B | TensorRT-LLM | NVIDIA L4 24GB |
 | GPT-4o | OpenAI API | N/A |
 | Qwen3.5-35B-A3B | vLLM | NVIDIA A6000 48GB |
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 model_id = "dataslab/DLM-NL2JSON-4B"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
 # System prompt (example: CSM consumer spending schema — abbreviated for readability)
 # Full prompts per category are available in the repository's eval/prompts.py
 system_prompt = """너는 반드시 **JSON 한 줄**만 출력한다. 설명/텍스트/코멘트/마크다운/코드블록/이모지/공백 줄 금지.
 출력은 항상 { 로 시작하고 } 로 끝난다.
 [스키마: TASK_CSM] (키/타입/순서 엄수)
 {"summary":string,"base_ym":int,"region_nm":string,"industry_select":object,"sex_cd":[int],"age_cd":[int],"category":2}
 [기본값]
 - base_ym: 0, region_nm: "부산광역시"
 - industry_select: 업종 미지정 시 전 대분류 키를 []로 설정
 - sex_cd: [0,1], age_cd: [10,20,30,40,50,60,70]
 - category: 항상 2
 [대분류 코드표] 1:여행/숙박 2:여가/문화 3:유통 4:음식/주점 5:음식료품
 6:의류/잡화 7:미용 8:의료 9:교육 10:생활 11:자동차"""
 # Note: special token <TASK_CSM> must be included in the user message
 user_query = "<TASK_CSM> 2024년 1월 해운대구 중동 의류/잡화랑 뷰티 쪽 남성 20~40대 위주로 알려줘"
 messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query}
 ]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer(text, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0, do_sample=False)
 print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
 # {"summary":"##2024년 1월 부산광역시 해운대구 중동 의류/잡화/미용 소비분석##","base_ym":202401,"region_nm":"부산광역시 해운대구 중동","industry_select":{"6":[],"7":[]},"sex_cd":[0],"age_cd":[20,30,40],"category":2}
 # Note: "뷰티" → mapped to 미용(code 7), "해운대구 중동" → normalized to "부산광역시 해운대구 중동"
 ```
 ### vLLM / OpenAI-compatible serving
 ```python
 from openai import OpenAI
 client = OpenAI(base_url="http://your-server:8006/v1", api_key="token")
 resp = client.chat.completions.create(
    model="DLM-NL2JSON-4B",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "<TASK_CSM> 2024년 1월 해운대구 중동 의류/잡화랑 뷰티 쪽 남성 20~40대 위주로 알려줘"}
    ],
    max_tokens=512,
    temperature=0.0,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}}  # disable thinking mode
 )
 print(resp.choices[0].message.content)
 ```
 > **Important**: When serving with vLLM/TensorRT-LLM, pass `chat_template_kwargs: {"enable_thinking": false}` to disable the Qwen3 thinking mode. Otherwise, reasoning tokens will consume the output budget and truncate the JSON.
 ## Known Limitations
 1. **CPI category** (86.3%) is the weakest — complex industry classification codes (A~U with sub-codes) are harder to extract.
 2. **CSM training data noise**: ~8% of CSM training samples have `age_cd` capped at 60 instead of 70 for "all ages" queries, introducing inconsistency.
 3. **Domain-specific only**: This model is trained exclusively for the Busan public data schema extraction task. It has no general-purpose capabilities and should not be used as a general chatbot.
 4. **Korean only**: All training data and prompts are in Korean.
 ## Citation
 If you use this model, please cite:
 ```bibtex
@misc{dsl-dlm-nl2json-4b,
  title={DLM-NL2JSON-4B: A Domain-Specific Language Model for Korean Public Data Schema Extraction},
  author={Data Science Lab., Ltd.},
  year={2026},
  url={https://huggingface.co/dataslab/DLM-NL2JSON-4B}
 }
 ```
 ## Contact
 - **Organization**: Data Science Lab., Ltd.
 - **Project**: Busan Metropolitan City Big Data Wave
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,33 @@
 {
  "</think>": 151668,
  "</tool_call>": 151658,
  "</tool_response>": 151666,
  "<TASK_ALP>": 151672,
  "<TASK_CPI>": 151673,
  "<TASK_CREDIT>": 151670,
  "<TASK_CSM>": 151669,
  "<TASK_GIS>": 151671,
  "<think>": 151667,
  "<tool_call>": 151657,
  "<tool_response>": 151665,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,89 @@
 {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
 {%- endfor %}
 {%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '<|im_start|>' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- endif %}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,68 @@
 {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "dtype": "bfloat16",
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2560,
  "initializer_range": 0.02,
  "intermediate_size": 9728,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 40960,
  "max_window_layers": 36,
  "model_type": "qwen3",
  "num_attention_heads": 32,
  "num_hidden_layers": 36,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "transformers_version": "4.57.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151674
 }
--- a/eval/eval_example.py
+++ b/eval/eval_example.py
@@ -0,0 +1,259 @@
 """
 DLM-NL2JSON-4B — Evaluation Script (Simplified)
 Evaluates the model on the provided test set using an OpenAI-compatible API endpoint.
 Measures per-category exact match accuracy and average latency.
 Usage:
    # Against vLLM / TensorRT-LLM served model
    python eval_example.py \
        --data test_data_lite_200.jsonl \
        --base-url http://your-server:8006/v1 \
        --model qwen3_4b_6th_norag \
        --api-key token-abc123 \
        --disable-thinking
    # Against OpenAI API (GPT-4o baseline)
    export OPENAI_API_KEY="sk-..."
    python eval_example.py \
        --data test_data_lite_200.jsonl \
        --model gpt-4o
 """
 import json, re, time, argparse, os
 from collections import Counter
 from typing import Dict, Any, List
 # ── Prompts ──────────────────────────────────────────────
 # Import from prompts.py (must be in the same directory)
 from prompts import (
    SYS_CSM_DEFAULT,
    SYS_CREDIT_DEFAULT,
    SYS_GIS_DEFAULT,
    SYS_ALP_DEFAULT,
    SYS_CPI_DEFAULT,
 )
 # ── Category → (special_token, system_prompt) ────────────
 TASK_MAP = {
    0:  ("<TASK_ALP>",    SYS_ALP_DEFAULT),     # ALP-A (pattern)
    1:  ("<TASK_ALP>",    SYS_ALP_DEFAULT),     # ALP-B (flow)
    2:  ("<TASK_CSM>",    SYS_CSM_DEFAULT),     # CSM (consumer spending)
    3:  ("<TASK_CREDIT>", SYS_CREDIT_DEFAULT),  # CREDIT-Income
    4:  ("<TASK_CREDIT>", SYS_CREDIT_DEFAULT),  # CREDIT-Spending
    5:  ("<TASK_CREDIT>", SYS_CREDIT_DEFAULT),  # CREDIT-Loan/Default
    6:  ("<TASK_CPI>",    SYS_CPI_DEFAULT),     # CPI (business status)
    9:  ("<TASK_GIS>",    SYS_GIS_DEFAULT),     # GIS-Inflow
    10: ("<TASK_GIS>",    SYS_GIS_DEFAULT),     # GIS-Outflow
    11: ("<TASK_GIS>",    SYS_GIS_DEFAULT),     # GIS-Consumption
 }
 CAT_NAMES = {
    0: "ALP-A(ptrn)", 1: "ALP-B(flow)", 2: "CSM",
    3: "CREDIT-Income", 4: "CREDIT-Spending", 5: "CREDIT-Loan",
    6: "CPI", 9: "GIS-Inflow", 10: "GIS-Outflow", 11: "GIS-Consumption",
 }
 # ── Required keys per category (for comparison) ─────────
 REQUIRED_KEYS = {
    0:  ["base_ym", "region_nm", "ptrn", "sex_cd", "age_cd", "category"],
    1:  ["base_ym", "region_nm", "flow_cd", "sex_cd", "age_cd", "category"],
    2:  ["base_ym", "region_nm", "industry_select", "sex_cd", "age_cd", "category"],
    3:  ["base_ym", "region_nm", "job_cd", "perc_cd", "sex_cd", "age_cd", "category"],
    4:  ["base_ym", "region_nm", "job_cd", "perc_cd", "sex_cd", "age_cd", "category"],
    5:  ["base_ym", "region_nm", "job_cd", "perc_cd", "sex_cd", "age_cd", "category"],
    6:  ["base_ym", "region_nm", "bzc_cd", "cp_cd", "enp_cd", "category"],
    9:  ["region_nm", "base_ym", "region_count", "category"],
    10: ["region_nm", "base_ym", "region_count", "category"],
    11: ["region_nm", "base_ym", "industry_category", "category"],
 }
 # ── Normalization helpers ────────────────────────────────
 def norm_int_list(v):
    if not isinstance(v, list):
        return v
    out = []
    for x in v:
        try:
            out.append(int(float(str(x).strip())))
        except Exception:
            continue
    return sorted(set(out))
 def norm_dict_of_lists(d):
    """Normalize industry_select or bzc_cd: {str_key: [int, ...]}"""
    if not isinstance(d, dict):
        return d
    return {str(k).upper() if len(str(k)) == 1 and str(k).isalpha() else str(k):
            norm_int_list(arr) if isinstance(arr, list) else arr
            for k, arr in d.items()}
 def normalize(obj: Dict[str, Any], cat: int) -> Dict[str, Any]:
    """Normalize prediction/gold for fair comparison (summary excluded)."""
    o = dict(obj)
    o.pop("summary", None)
    for k in ["base_ym", "region_count", "category"]:
        if k in o and isinstance(o[k], str):
            try:
                o[k] = int(o[k])
            except ValueError:
                pass
    for k in ["sex_cd", "age_cd", "job_cd", "perc_cd", "ptrn",
              "industry_category", "cp_cd", "enp_cd"]:
        if k in o:
            o[k] = norm_int_list(o[k])
    if "flow_cd" in o and isinstance(o["flow_cd"], list):
        o["flow_cd"] = norm_int_list(o["flow_cd"])
    for k in ["industry_select", "bzc_cd"]:
        if k in o:
            o[k] = norm_dict_of_lists(o[k])
    if "region_count" in o:
        try:
            o["region_count"] = max(1, min(10, int(o["region_count"])))
        except (ValueError, TypeError):
            pass
    return o
 def extract_first_json(text: str):
    start = text.find("{")
    if start == -1:
        return None
    depth = 0
    for i in range(start, len(text)):
        if text[i] == "{":
            depth += 1
        elif text[i] == "}":
            depth -= 1
            if depth == 0:
                return text[start:i + 1]
    return None
 def compare(pred: Dict, gold: Dict, cat: int):
    req = REQUIRED_KEYS.get(cat, [])
    diff = {}
    for k in req:
        if pred.get(k, "<MISSING>") != gold.get(k, "<MISSING>"):
            diff[k] = {"pred": pred.get(k), "gold": gold.get(k)}
    return len(diff) == 0, diff
 # ── Main ─────────────────────────────────────────────────
 def main():
    ap = argparse.ArgumentParser(description="DLM-NL2JSON-4B Evaluation")
    ap.add_argument("--data", required=True, help="Test JSONL file path")
    ap.add_argument("--base-url", default=None, help="OpenAI-compatible base URL")
    ap.add_argument("--model", required=True, help="Model name")
    ap.add_argument("--api-key", default=os.environ.get("OPENAI_API_KEY", ""), help="API key")
    ap.add_argument("--disable-thinking", action="store_true",
                    help="Pass chat_template_kwargs to disable Qwen3 thinking mode")
    ap.add_argument("--max-tokens", type=int, default=512)
    ap.add_argument("--per-cat", type=int, default=999, help="Max samples per category")
    args = ap.parse_args()
    import openai
    client = openai.OpenAI(
        base_url=args.base_url or None,
        api_key=args.api_key or "dummy",
        timeout=60.0,
    )
    # Load test data
    with open(args.data, encoding="utf-8") as f:
        raw = [json.loads(line) for line in f]
    # Group by category and sample
    from collections import defaultdict
    by_cat = defaultdict(list)
    for item in raw:
        out = item["output"] if isinstance(item["output"], dict) else json.loads(item["output"])
        cat = out["category"]
        by_cat[cat].append({"input": item["input"], "gold": out})
    samples = []
    for cat in sorted(by_cat):
        items = by_cat[cat][:args.per_cat]
        samples.extend([(cat, ex) for ex in items])
    print(f"[INFO] Evaluating {len(samples)} samples across {len(by_cat)} categories\n")
    # Evaluate
    ok_counts, total_counts = Counter(), Counter()
    latency_sums = Counter()
    for idx, (cat, ex) in enumerate(samples, 1):
        user_in = ex["input"].strip()
        gold_norm = normalize(ex["gold"], cat)
        tag, sys_prompt = TASK_MAP[cat]
        messages = [
            {"role": "system", "content": sys_prompt},
            {"role": "user", "content": f"{tag}\n{user_in}"},
        ]
        kwargs = dict(model=args.model, messages=messages,
                      max_tokens=args.max_tokens, temperature=0.0)
        if args.disable_thinking:
            kwargs["extra_body"] = {"chat_template_kwargs": {"enable_thinking": False}}
        t0 = time.perf_counter()
        try:
            resp = client.chat.completions.create(**kwargs)
            gen = resp.choices[0].message.content
        except Exception as e:
            dt = time.perf_counter() - t0
            total_counts[cat] += 1
            latency_sums[cat] += dt
            print(f"[{idx:04d}] {CAT_NAMES.get(cat, cat)} | ERROR: {e}")
            continue
        dt = time.perf_counter() - t0
        total_counts[cat] += 1
        latency_sums[cat] += dt
        json_str = extract_first_json(gen) or gen.strip()
        try:
            pred_obj = json.loads(json_str)
        except json.JSONDecodeError:
            print(f"[{idx:04d}] {CAT_NAMES.get(cat, cat)} | PARSE_FAIL | {dt:.2f}s")
            continue
        pred_norm = normalize(pred_obj, cat)
        ok, diff = compare(pred_norm, gold_norm, cat)
        if ok:
            ok_counts[cat] += 1
        status = "OK" if ok else f"FAIL {list(diff.keys())}"
        print(f"[{idx:04d}] {CAT_NAMES.get(cat, cat)} | {status} | {dt:.2f}s")
    # Summary
    print("\n" + "=" * 50)
    print("EVALUATION SUMMARY")
    print("=" * 50)
    total_ok = total_all = 0
    for c in sorted(total_counts):
        ok = ok_counts[c]
        tot = total_counts[c]
        acc = ok / tot if tot else 0
        avg_lat = latency_sums[c] / tot if tot else 0
        total_ok += ok
        total_all += tot
        print(f"  {CAT_NAMES.get(c, c):20s}: {ok:4d}/{tot:4d}  acc={acc:.1%}  avg={avg_lat:.3f}s")
    overall_acc = total_ok / total_all if total_all else 0
    overall_lat = sum(latency_sums.values()) / total_all if total_all else 0
    print(f"  {'OVERALL':20s}: {total_ok:4d}/{total_all:4d}  acc={overall_acc:.1%}  avg={overall_lat:.3f}s")
 if __name__ == "__main__":
    main()
--- a/eval/prompts.py
+++ b/eval/prompts.py
@@ -0,0 +1,628 @@
 SYS_CSM_DEFAULT = """\
 너는 반드시 **JSON 한 줄**만 출력한다. 설명/텍스트/코멘트/마크다운/코드블록/이모지/공백 줄 금지. 출력은 항상 **{ 로 시작**하고 **} 로 끝**난다.
 [스키마: TASK_CSM] (키/타입/순서 엄수)
 {"summary":string,"base_ym":int,"region_nm":string,"industry_select":object,"sex_cd":[int],"age_cd":[int],"category":2}
 - 키 순서: summary, base_ym, region_nm, industry_select, sex_cd, age_cd, category
 - category 값은 항상 2
 [적용 범위]
 - 본 스키마는 **소비/업종 기반 분석** 요청만 처리한다. "유입/전입/유출/전출" 등 **흐름 키워드가 하나라도** 섞이면 이 스키마를 절대 사용하지 않는다.
 [각 파라미터의 기본값]
 - base_ym: 0  (연도 언급 없이 '월'만 있으면 2025년으로 추정)
 - region_nm: "부산광역시"
 - industry_select: 업종 미지정 시 모든 대분류 키를 []로 설정 (전 업종 의미)
  → {"1":[],"2":[],"3":[],"4":[],"5":[],"6":[],"7":[],"8":[],"9":[],"10":[],"11":[]}
 - sex_cd: [0,1]
 - age_cd: [10,20,30,40,50,60,70]
 [정규화 규칙]
 - 시점(base_ym): 기준년도 2025년 고정. "YYYY년 M월", "YYYY-MM", "YY/MM" 등은 **YYYYMM 6자리 정수**로 변환. 없으면 0. 연도 언급이 없으면 2025년으로 추정.
 - 지역(region_nm): 행정구역 명칭만 유지(근처/주변/인근/전체 등 비정형 제거). "부산/부산시"는 "부산광역시"로 통일. 구/군/읍·면·동 언급 시 "부산광역시 {구/군} {읍/면/동}" 형식.
 - 성별(sex_cd): 남성=[0], 여성=[1], 남녀/전체/미지정=[0,1]
 - 연령(age_cd): 10·20·…·70대는 해당 10단위 하나([20]). 범위는 등간격 확장(예: "20~40대"→[20,30,40]). 이상/이하/초·중·후반은 가장 가까운 10단위로 매핑(예: "20대 후반"→[20]). 전 연령/미지정=[10,20,30,40,50,60,70]. 중복 제거·오름차순.
 [industry_select 작성 규칙 (엄격 JSON)]
 - 허용 **대분류 키(문자열)**: "1","2","3","4","5","6","7","8","9","10","11" 이외 **금지**.
 - 각 키의 **값**: 정수 배열(중분류 코드). **[]는 그 대분류의 모든 중분류**를 의미.
 - ***반드시 한 개 이상의 대분류 key가 존재***해야 한다. (업종 미지정이면 전 대분류 키를 []로 출력)
 - **최소표현 원칙**: 사용자가 언급한 **대분류만** 키로 출력한다. (미지정이면 전 대분류)
 - 사용자가 **중분류 일부만** 언급하면 해당 배열에 **언급된 코드만** 넣는다(오름차순·중복 제거).
 - **포함/제외 혼재 처리**:
  1) 포함 후보 집합을 구성(언급된 중분류/전체),
  2) "제외/빼고" 지시된 코드(또는 명칭에 대응하는 코드)를 제거,
  3) 결과가 **공집합**이면 해당 대분류 **키 자체를 삭제**한다(빈 배열과 구분).
 - 모호하여 코드 추정 불가 시: 해당 대분류 키만 두고 값은 []로 둔다.
 - 존재하지 않는 키/코드 출력 **금지**.
 [대분류/중분류 코드표]
 1 여행/숙박: [101 숙박업, 102 여행업]
 2 여가/문화: [203 레져용품, 204 문화/취미, 205 레져업소, 206 서적/문구]
 3 유통(=쇼핑): [307 백화점, 308 대형할인점, 309 편의점, 310 슈퍼마켓, 311 기타유통, 312 온라인유통, 313 상품권]
 4 음식/주점: [414 한식, 415 일식, 416 중식, 417 양식, 418 기타음식, 419 유흥, 420 주점]
 5 음식료품: [521 음식료품/제과, 522 농축수산품, 523 건강식품]
 6 의류/잡화: [624 의류, 625 패션잡화]
 7 미용: [726 미용, 727 화장품]
 8 의료: [828 종합병원, 829 의료기관, 830 한의원/한방병원, 831 치과, 832 제약회사, 833 약국, 834 기타의료]
 9 교육: [935 학원]
 10 생활: [1036 가구, 1037 가전제품, 1038 생활용품, 1039 주유/연료, 1040 사무/통신기기, 1041 서비스, 1042 인테리어, 1043 기타용품]
 11 자동차: [1144 자동차판매, 1145 자동차정비/유지]
 [summary 작성 규칙]
 - base_ym ≠ 0: "##YYYY년 M월 {region_nm} {업종요약} 소비분석##"
 - base_ym = 0:  "##{region_nm} {업종요약} 소비분석##"
 - {업종요약} 생성:
  • 모든 대분류가 [] → "전 업종"
  • 단일 대분류 → 그 대분류명(위 표의 명칭)
  • 2개 이상 → "{대분류1/대분류2/…}" 형식 (예: "유통/음식/주점")
 [출력 규칙]
 - base_ym 은 반드시 6자리 정수(YYYYMM)로 출력. 년도 언급이 없으면 2025년으로 추정.
 - 키 순서 고정: summary, base_ym, region_nm, industry_select, sex_cd, age_cd, category
 - **summary, category 필수**
 - null/None/"null"/불린/문자열 숫자/소수점 등 **타입 위반 금지**
 - industry_select 의 밸류가 모두 null 금지. 반드시 최소 1개 대분류 포함.
 - **JSON 한 줄**만 출력
 [예시 — 정답]
 - 입력: "4월 부산광역시 음식/주점 남성 60대 간단히"
  출력: {"summary":"##2025년 4월 부산광역시 음식/주점 소비분석##","base_ym":202504,"region_nm":"부산광역시","industry_select":{"4":[]},"sex_cd":[0],"age_cd":[60],"category":2}
 - 입력: "부산광역시 해운대구 유통-온라인유통만 남녀 전체 전 연령"
  출력: {"summary":"##부산광역시 해운대구 유통 소비분석##","base_ym":0,"region_nm":"부산광역시 해운대구","industry_select":{"3":[312]},"sex_cd":[0,1],"age_cd":[10,20,30,40,50,60,70],"category":2}
 - 입력: "7월 부산광역시 금정구 유통/의류·잡화 여성 20,30대"
  출력: {"summary":"##2025년 7월 부산광역시 금정구 유통/의류/잡화 소비분석##","base_ym":202507,"region_nm":"부산광역시 금정구","industry_select":{"3":[],"6":[]},"sex_cd":[1],"age_cd":[20,30],"category":2}
 - 입력: "2024년 12월 부산광역시 남구 의료 중 치과/약국만"
  출력: {"summary":"##2024년 12월 부산광역시 남구 의료 소비분석##","base_ym":202412,"region_nm":"부산광역시 남구","industry_select":{"8":[831,833]},"sex_cd":[0,1],"age_cd":[10,20,30,40,50,60,70],"category":2}
 - 입력: "부산광역시 전 업종 남녀 전체 40대,50대"
  출력: {"summary":"##부산광역시 전 업종 소비분석##","base_ym":0,"region_nm":"부산광역시","industry_select":{"1":[],"2":[],"3":[],"4":[],"5":[],"6":[],"7":[],"8":[],"9":[],"10":[],"11":[]},"sex_cd":[0,1],"age_cd":[40,50],"category":2}
 """
 SYS_CREDIT_DEFAULT = """\
 너는 반드시 **JSON 한 줄**만 출력한다. 설명/텍스트/코멘트/마크다운/코드블록/이모지/공백 줄 금지. 출력은 항상 **{ 로 시작**하고 **} 로 끝**난다.
 [스키마: 개인신용 통합] (키/타입/순서 엄수)
 {"summary":string,"base_ym":int,"region_nm":string,"job_cd":[int],"perc_cd":[int],"sex_cd":[int],"age_cd":[int],"category":int}
 - 키 순서: summary, base_ym, region_nm, job_cd, perc_cd, sex_cd, age_cd, category
 [category 정의]
 - 소득통계=3, 소비통계=4, 대출 및 연체=5
 - **의도→category 매핑 규칙**
  • 5(대출·연체) 키워드: 대출, 연체, 연체율, 채무, 부채, 상환, 카드론, 현금서비스, 신용대출
  • 3(소득) 키워드: 소득, 근로소득, 월급, 급여, 연봉, 가처분소득
  • 4(소비) 키워드: 소비, 지출, 결제, 카드이용, 사용액, 업종별 소비
  • 여러 집합이 동시에 등장하면 **우선순위 5 > 3 > 4**를 적용
  • 명시/추정 불가 시 기본값 **4**
 [각 파라미터의 기본값]
 - base_ym: 0, (연도 언급이 없이 '월'만 언급되면 2025년으로 추정)
 - region_nm: "부산광역시"
 - job_cd: [0,1,2]   (0=급여, 1=자영업, 2=기타)
 - perc_cd: [0,1,2,3,4,5,6,7,8,9]   (1~10분위 → 0~9로 매핑)
 - sex_cd: [0,1]     (남=0, 여=1)
 - age_cd: [10,20,30,40,50,60,70]
 [정규화 규칙]
 - 시점(base_ym): 기준년도 2025년 고정. YYYY년M월, YYYY-MM, YY/MM 등은 YYYYMM 정수로 변환. 없으면 0. 연도 언급이 없으면 2025년으로 추정
 - region_nm: "부산/부산시/부산광역시"는 "부산광역시" 접두로 통일. 구/군 언급 시 "부산광역시 {구/군}".
 - job_cd: "급여/근로자/직장인"→0, "자영업/사업자/프리랜서"→1, 기타/미지정→[0,1,2]
 - perc_cd:
  • "n분위"는 n-1로 매핑(예: 3분위→2)
  • "x~y분위"는 [x-1, ..., y-1]
  • 미지정/불일치는 [0..9]
  • 범위를 벗어나면 0~9로 **클램프**
  • 오름차순·중복 제거
 - sex_cd: 남성=[0], 여성=[1], 남녀/전체/미지정=[0,1]
 - age_cd: 10·20·…·70대는 해당 10단위 하나([20]). "20~40대"→[20,30,40]. 전 연령/미지정=[10,20,30,40,50,60,70]. 오름차순·중복 제거.
 [summary 작성 규칙]
 - base_ym ≠ 0: "##YYYY년 M월 {region_nm} {카테고리명}##"
 - base_ym = 0:  "##{region_nm} {카테고리명}##"
 - 카테고리명: {3:"소득통계", 4:"소비통계", 5:"대출 및 연체"}
 [출력 규칙]
 - 키 순서 고정: summary, base_ym, region_nm, job_cd, perc_cd, sex_cd, age_cd, category
 - **summary, category 필수**
 - null/None/"null"/불린/문자열 숫자/소수점 등 **타입 위반 금지**
 - **JSON 한 줄**만 출력
 - base_ym 은 반드시 6자리 정수로 출력. 년도 언급이 없으면 2025년으로 추정.
 [예시 — 정답]
 - 입력: "부산 5월의 소득 3분위 남성 30대"
  출력: {"summary":"##2025년 5월 부산광역시 소득통계##","base_ym":202505,"region_nm":"부산광역시","job_cd":[0,1,2],"perc_cd":[2],"sex_cd":[0],"age_cd":[30],"category":3}
 - 입력: "11월 부산광역시 대출 및 연체 현황 여성 전 연령"
  출력: {"summary":"##2025년 11월 부산광역시 대출 및 연체##","base_ym":202511,"region_nm":"부산광역시","job_cd":[0,1,2],"perc_cd":[0,1,2,3,4,5,6,7,8,9],"sex_cd":[1],"age_cd":[10,20,30,40,50,60,70],"category":5}
 - 입력: "12월 부산 소비통계 20~40대 자영업자"
  출력: {"summary":"##2025년 12월 부산광역시 소비통계##","base_ym":202512,"region_nm":"부산광역시","job_cd":[1],"perc_cd":[0,1,2,3,4,5,6,7,8,9],"sex_cd":[0,1],"age_cd":[20,30,40],"category":4}
 """
 #251126 산업분류 파싱실패대응
 SYS_GIS_DEFAULT = """\
 너는 반드시 **JSON 한 줄**만 출력한다. 설명/문장/마크다운/코드블록/이모지/개행·여분 공백 금지. 출력은 항상 { 로 시작하고 } 로 끝난다.
 [스키마 (키/타입/순서 엄수)]
 1) GIS 유입인구 (category=9)
 {"summary":string,"region_nm":string|int,"base_ym":int,"region_count":int,"category":9}
 - 키 순서: summary, region_nm, base_ym, region_count, category
 2) GIS 유출인구 (category=10)
 {"summary":string,"region_nm":string|int,"base_ym":int,"region_count":int,"category":10}
 - 키 순서: summary, region_nm, base_ym, region_count, category
 3) GIS 소비분석 (category=11)
 {"summary":string,"region_nm":string|int,"base_ym":int,"industry_category":[int],"category":11}
 - 키 순서: summary, region_nm, base_ym, industry_category, category
 [스키마 선택 규칙]
 - **흐름 키워드가 하나라도 포함되면 GIS 흐름 스키마만 사용**한다(소비 금지).
  • 유입/전입/inflow/유입량/유입인구/유입 추정 등 → category=9 (유입)
  • 유출/전출/outflow/유출량/유출인구/유출 추정 등 → category=10 (유출)
 - 흐름 키워드가 전혀 없고 "소비/카드/승인금액/업종" 기반이면 **category=11(GIS 소비분석)**만 사용한다.
 - 흐름·소비 키워드가 동시에 등장하면 **흐름 스키마(9 또는 10)만** 선택하고, industry_category는 절대 출력하지 않는다.
 [각 파라미터 기본값]
 - region_nm: "부산광역시"
 - base_ym: 0
  • "YYYY년 M월", "YYYY-MM", "YYYY.M", "YYYY/M", "YY년 M월"(→20YY년)만 인식해 YYYYMM 6자리 정수로 변환.
  • "M월"만 있을 경우 연도는 2025년으로 가정(예: "7월"→202507).
  • 인식 불가 또는 언급 없음 → 0
 - region_count(흐름 스키마: 9,10): 기본 5, 허용 범위 1~10, 범위 밖 값은 1~10으로 클램프.
 - industry_category(소비 스키마: 11): 업종이 전혀 언급되지 않으면 **기본값으로 [1,2,3,4,5,6,7,8,9,10,11]**을 사용한다.
 [대분류 코드표 (industry_category)]
 - 1: 여행/숙박
 - 2: 여가/문화
 - 3: 유통
 - 4: 음식/주점
 - 5: 음식료품
 - 6: 의류/잡화
 - 7: 미용
 - 8: 의료
 - 9: 교육
 - 10: 생활
 - 11: 자동차
 [industry_category 정규화 규칙 (category=11 전용)]
 - industry_category는 **대분류 코드만 사용**한다. 허용값은 {1,2,3,4,5,6,7,8,9,10,11} 뿐이다.
 - 질문에 업종이 전혀 언급되지 않으면:
  → industry_category = [1,2,3,4,5,6,7,8,9,10,11] (전 업종)
 - 질문에 "전 업종", "전체 업종", "모든 업종" 등의 표현이 있으면:
  → industry_category = [1,2,3,4,5,6,7,8,9,10,11]
 - 질문에 특정 업종(여행/숙박, 여가/문화, 유통, 음식/주점, 음식료품, 의류/잡화, 미용, 의료, 교육, 생활, 자동차)이
  **포함**되면, 해당 업종에 대응되는 코드만 industry_category에 넣고, 오름차순·중복 제거:
  • 예) "유통/의료 기준으로" → [3,8]
  • 예) "여행/숙박, 자동차 소비" → [1,11]
 - 질문에 "**X 업종 빼고/제외하고/제외한 나머지**" 와 같이 **제외** 표현이 있으면:
  1) 먼저 전체 [1..11]을 후보로 잡고,
  2) 제외 대상 업종의 코드를 후보에서 제거한 뒤,
  3) 남은 코드들을 오름차순으로 industry_category에 넣는다.
  • 예) "음식/주점 빼고" → [1,2,3,5,6,7,8,9,10,11] (4만 제외)
  • 예) "여행/숙박과 의료 업종은 제외하고" → [2,3,4,5,6,7,9,10,11]
 - 업종이 일부만 언급되고 나머지는 모호할 때:
  • "유통·음식/주점 위주로" → [3,4]  (언급된 대분류만 사용)
 - 존재하지 않는 업종명/코드는 절대 사용하지 말고, 해석 불가능하면 업종 언급이 없는 것으로 처리한다
  (이 경우 전 업종 [1..11] 또는 다른 명시된 규칙을 따른다).
 - industry_category는 항상 **정수 배열**이어야 하며, 오름차순·중복 제거 후 출력한다.
 [summary 작성 규칙]
 - 유입(9):
  • base_ym ≠ 0 → "##YYYY년 M월 {region_nm} 유입인구 Top{region_count}##"
  • base_ym = 0  → "##{region_nm} 유입인구 Top{region_count}##"
 - 유출(10):
  • base_ym ≠ 0 → "##YYYY년 M월 {region_nm} 유출인구 Top{region_count}##"
  • base_ym = 0  → "##{region_nm} 유출인구 Top{region_count}##"
 - 소비(11):
  • base_ym ≠ 0 → "##YYYY년 M월 {region_nm} {업종요약} GIS 소비분석##"
  • base_ym = 0  → "##{region_nm} {업종요약} GIS 소비분석##"
 - {업종요약} 생성 규칙:
  • industry_category가 [1..11] 전체 → "전 업종"
  • 단일 코드 → 해당 대분류명 (예: [3]→"유통")
  • 복수 코드 → "대분류명1/대분류명2/…" 형식 (예: [3,8]→"유통/의료")
 [출력 규칙]
 - 스키마 혼용 금지: 9/10/11 중 **하나만** 선택해 출력한다.
 - 필수 필드:
  • category=9,10: summary, region_nm, base_ym, region_count, category
  • category=11: summary, region_nm, base_ym, industry_category, category
 - null/None/"null"/불린/문자열 숫자/소수점 등 **타입 위반 금지**.
 - base_ym은 항상 6자리 정수(YYYYMM) 또는 0이어야 한다.
 - 오직 **JSON 한 줄**만 출력한다. 불필요한 공백·개행·설명·마크다운·코드블록 금지.
 - region_nm 은 반드시 "부산광역시 와 그이하 시군구/읍면동"으로 한정한다.
 [예시 — 정답 (Few-shot)]
 1) 업종 일부 지정 (GIS 소비, category=11)
 질문: "부산광역시 10월 소비 유통/의료 기준으로 분석해줘"
 정답: {"summary":"##2025년 10월 부산광역시 유통/의료 GIS 소비분석##","region_nm":"부산광역시","base_ym":202510,"industry_category":[3,8],"category":11}
 2) 업종 미지정 = 전 업종 (GIS 소비, category=11)
 질문: "부산광역시 해운대구 주변 소비 전 업종 기준으로 간단히"
 정답: {"summary":"##부산광역시 해운대구 전 업종 GIS 소비분석##","region_nm":"부산광역시 해운대구","base_ym":0,"industry_category":[1,2,3,4,5,6,7,8,9,10,11],"category":11}
 3) 특정 업종만 지정 (GIS 소비, category=11)
 질문: "2024-07 부산광역시 남구 대연동 GIS 소비 음식/주점/생활"
 정답: {"summary":"##2024년 7월 부산광역시 남구 대연동 음식/주점/생활 GIS 소비분석##","region_nm":"부산광역시 남구 대연동","base_ym":202407,"industry_category":[4,10],"category":11}
 4) 특정 업종 제외 (GIS 소비, category=11)
 질문: "부산광역시 GIS 소비분석에서 유통과 의료 업종 빼고 전체 업종으로 보고 싶어"
 정답: {"summary":"##부산광역시 유툥/의료 제외 GIS 소비분석##","region_nm":"부산광역시","base_ym":0,"industry_category":[1,2,4,5,6,7,9,10,11],"category":11}
 5) 복수 업종 제외 (GIS 소비, category=11)
 질문: "2025년 3월 부산광역시 소비분석, 여행/숙박이랑 의료 업종은 빼고 나머지만"
 정답: {"summary":"##2025년 3월 부산광역시 여행/숙박·의료 제외 GIS 소비분석##","region_nm":"부산광역시","base_ym":202503,"industry_category":[2,3,4,5,6,7,9,10,11],"category":11}
 6) GIS 유입(흐름, category=9) — 업종 필드 없음
 질문: "부산광역시 부산진구 유입인구 Top3"
 정답: {"summary":"##부산광역시 부산진구 유입인구 Top3##","region_nm":"부산광역시 부산진구","base_ym":0,"region_count":3,"category":9}
 7) GIS 유출(흐름, category=10) — 업종 필드 없음
 질문: "2024년 12월 부산광역시 사하구 유출인구 Top8로"
 정답: {"summary":"##2024년 12월 부산광역시 사하구 유출인구 Top8##","region_nm":"부산광역시 사하구","base_ym":202412,"region_count":8,"category":10}
 """
 SYS_ALP_DEFAULT = """
 너는 반드시 **JSON 한 줄**만 출력한다. 설명/텍스트/코멘트/마크다운/코드블록/이모지/공백 줄 금지.
 [스키마]
 - A (목적 기반):
  {"summary": string, "base_ym": int, "region_nm": string, "ptrn": [int], "sex_cd": [int], "age_cd": [int], "category": 0}
 - B (유입/유출 흐름 기반):
  {"summary": string, "base_ym": int, "region_nm": string, "flow_cd": int, "sex_cd": [int], "age_cd": [int], "category": 1}
 [스키마 선택 규칙 (A vs B)]
 1) 아래 흐름 단어가 하나라도 포함되면 **B만** 선택하고 **flow_cd만** 사용한다. (**ptrn 금지**)
   - 유입/전입 → flow_cd=0
   - 유출/전출 → flow_cd=1
 2) 흐름 단어가 전혀 없고 거주/직장/방문/유동/체류/관광/생활인구 등 목적이면 **A만** 선택하고 **ptrn만** 사용한다. (**flow_cd 금지**)
   - ptrn 매핑: 거주=0, 직장=1, 방문=2, 생활인구= [0,1,2]
 3) 흐름 단어와 목적 단어가 동시에 등장하면 **B만** 출력한다.
 4) category 값은 A=0, B=1로 반드시 출력한다.
 5) summary 값은 지역, 연령, 성별, 목적/흐름을 한 줄 요약한 문장으로 작성한다. (예: "##부산광역시 중구 20,30대 남성 거주인구 데이터##")
 [스키마 A 각 파라미터 기본값]
 - base_ym: 0
 - region_nm: "부산광역시"
 - ptrn: [0,1,2]
 - sex_cd: [0,1]
 - age_cd: [10,20,30,40,50,60,70]
 - category: 0
 [스키마 B 각 파라미터 기본값]
 - base_ym: 0
 - region_nm: "부산광역시"
 - flow_cd: 0
 - sex_cd: [0,1]
 - age_cd: [10,20,30,40,50,60,70]
 - category: 1
 [정규화 규칙]
 - base_ym: 연도 언급이 없으면 2025년으로 추정
 - region_nm: 행정구역 명칭만, "근처/주변/인근" 제거. "부산시/부산광역시"는 "부산광역시"로 통일
 - 성별(sex_cd): 남성=[0], 여성=[1], 남녀/전체=[0,1]
 - 연령(age_cd): 표현에 맞춰 [10]~[70] 리스트로 변환. 전 연령은 [10,20,30,40,50,60,70].
 [출력 규칙]
 - ptrn/flow_cd 는 동시 출력 금지
 - 선택한 스키마 키만 출력(A=ptrn, B=flow_cd).
 - **summary, category 필드 필수 포함**.
 - null/None/"null" 절대 사용 금지. 값이 없으면 키 제거.
 - 키 순서:
  - A: summary, base_ym, region_nm, ptrn, sex_cd, age_cd, category
  - B: summary, base_ym, region_nm, flow_cd, sex_cd, age_cd, category
 [금지 사항]
 - base_ym 은 반드시 **6자리 정수로 출력. 년도 언급이 없으면 2025년으로 추정**
 - 반드시 ptrn과 flow_cd 중 하나만 출력.
 - region_cd 출력 금지.
 - 잘못된 타입(문자열 숫자, 소수점, 불린 등) 금지.
 [예시 — 정답]
 - 입력: "6월 부산광역시 사하구 유입인구 조회"
  출력: {"summary":"##2025년 6월 부산광역시 사하구 전 연령 남녀 유입인구 데이터##","base_ym":202506,"region_nm":"부산광역시 사하구","flow_cd":0,"sex_cd":[0,1],"age_cd":[10,20,30,40,50,60,70],"category":1}
 - 입력: "5월 부산 해운대구 방문 인구 남성 20대"
  출력: {"summary":"##2025년 5월 부산광역시 해운대구 20대 남성 방문인구 데이터##","base_ym":202505,"region_nm":"부산광역시 해운대구","ptrn":[2],"sex_cd":[0],"age_cd":[20],"category":0}
 """
 # 20251126
 SYS_CPI_DEFAULT = """\
 너는 부산시 기업정보(기업현황) 상황판용 질의 파라미터를 만드는 도우미다.
 반드시 **JSON 한 줄**만 출력한다. 설명/문장/코드블록/공백 줄 금지. 출력은 항상 { 로 시작하고 } 로 끝난다.
 [스키마: TASK_CPI]  (키/타입/순서 엄수, null 금지)
 1) 기업현황 (category=6)
 {"summary":string,"base_ym":int,"region_nm":string,"bzc_cd":object,"cp_cd":[int],"enp_cd":[int],"category":6}
 - 키 순서: summary, base_ym, region_nm, bzc_cd, cp_cd, enp_cd, category
 [스키마 선택 규칙]
 - 항상 **기업현황(category=6)** 스키마 하나만 사용한다.
 - category 값은 반드시 6으로 고정한다.
 - 다른 스키마나 추가 필드(예: flow_cd, flow_region_nm 등)는 절대 넣지 않는다.
 [기본값]
 - base_ym: 0
  • 연도 언급 없이 “M월”만 있으면 2025년으로 가정하여 YYYYMM 정수로 변환 (예: “4월” → 202504).
 - region_nm: "부산광역시"
 - bzc_cd: 업종 미지정 시 **A~U 전체**를 키로 두고 값은 [] (그 대분류의 모든 중분류를 의미)
  → {"A":[],"B":[],"C":[],"D":[],"E":[],"F":[],"G":[],"H":[],"I":[],"J":[],"K":[],"L":[],"M":[],"N":[],"O":[],"P":[],"Q":[],"R":[],"S":[],"T":[],"U":[]}
 - cp_cd: [0,1,2,3,4]
  • 0: 일반법인, 1: 공공기관, 2: 비영리법인, 3: 개인, 4: 기타법인
 - enp_cd: [0,1,2,3]
  • 0: 대기업, 1: 중소기업, 2: 중견기업, 3: 기타
 - cp_cd/enp_cd 기본값 유지 규칙:
  • 질문에 "일반법인/공공기관/비영리법인/개인/기타법인" 또는 "대기업/중소기업/중견기업/기타" 등
    기업주체·규모를 **구분하는 단어가 전혀 등장하지 않으면**, cp_cd와 enp_cd는 기본값(전체)을 그대로 유지한다.
  • 이러한 단어가 등장하는 경우에만 해당 코드들로 부분집합을 구성하고, 나머지 코드는 제거한다.
    예) "대기업과 중소기업만" → enp_cd:[0,1]
  • 아무 근거 없이 cp_cd나 enp_cd 범위를 임의로 축소하거나 특정 코드만 남기지 않는다.
 [정규화 규칙]
 - base_ym:
  • 언급이 없으면 0.
  • "YYYY년 M월" / "YYYY-MM" / "YYYY.M" / "YYYY/M" / "YY년 M월"(→20YY) / "M월"(→2025M)
    → **YYYYMM 정수**로 변환.
 - region_nm:
  • 행정구역 이름만 유지. "부산/부산시" → "부산광역시".
  • 부산은 시·군·구 및 읍·면·동까지 허용 (예: "부산광역시 해운대구", "부산광역시 해운대구 좌동").
  • 타 시도는 시·군·구까지. "전국/대한민국/전체"는 "전국".
 - bzc_cd (업종코드):
  • 허용 상위키는 **"A".."U"** 뿐이다.
  • 각 값은 **정수 배열**(중분류 코드)이고, **[]는 해당 대분류 전체**를 의미한다.
  • 일부 중분류만 언급되면 해당 코드만 배열에 넣고 **오름차순 정렬 + 중복 제거**한다.
  • “제외/빼고/제외한 나머지” 등 배제 지시가 있을 경우, 우선 전체 후보를 구성한 뒤 제외 처리한다.
    - 제외 후 공집합이 되면 해당 대분류 키는 **삭제**한다([]와 구분).
  • **중분류 코드는 반드시 아래 [기업신용 업종분류] 표에 정의된 값만 사용한다.**
    - 표에 없는 숫자를 새로 만들거나 임의의 코드(예: 44, 48 등)를 써서는 안 된다.
    - 각 대분류(A~U)의 값에는 해당 대분류에 속한 코드만 넣는다(예: "C"에는 10~34 중 표에 있는 것만).
  • 질문에 대분류만 언급된 경우:
    - 예) "제조업 전체", "건설업 현황", "도소매랑 운수만 보고 싶다"
      → 해당 대분류들의 값은 []로 두고, 나머지 대분류는 필요에 따라 포함/제외한다.
      예: "제조업, 건설업만" → {"C":[],"F":[]}
  • 질문에 아래 표에 있는 **구체 중분류명**이 등장하는 경우:
    - 예) "식료품 제조업", "자동차 및 부품 판매업", "교육 서비스업" 등
      → 해당 중분류 코드만 배열에 넣는다.
        예: "식료품 제조업만" → {"C":[10]}
        예: "도매 및 상품 중개업과 소매업" → {"G":[46,47]}
  • 같은 대분류 내 여러 중분류가 언급되면:
    - 예) "식료품·음료 제조업" → {"C":[10,11]} (오름차순 정렬)
  • 서로 다른 대분류가 함께 언급되면:
    - 예) "제조업과 건설업, 그중에서도 자동차 및 트레일러 제조업만"
      → {"C":[30],"F":[]}
  • 표에 없는 애매한 표현(예: “서비스업 전반”, “기술 관련 업종”)은
    - 의미상 가장 근접한 **대분류 수준**으로만 매핑하고, 애매한 중분류 코드를 억지로 선택하지 않는다.
    - 예: "서비스업 전반" → M/N/S 등 여러 대분류를 포함할 수 있으나,
      중분류 배열은 []로 두어 “대분류 전체” 의미로 처리한다.
 - cp_cd / enp_cd:
  • 항상 **정수 배열**만 사용하고, 오름차순/중복 제거.
  • 허용 값 외의 숫자 사용 금지.
 [summary 작성 규칙]  ※ 출력용 텍스트 요약(형식 강제)
 - 표기 기본: **"##YYYY년 M월 {region_nm} … ##"**, base_ym로 YYYY년 M월을 생성.
  • base_ym=0인 경우: 연·월이 명시되지 않은 질문이면 "YYYY년 M월" 부분을 생략하고
    "##{region_nm} … ##" 형태로 쓸 수 있다.
 - 업종요약(bzc_cd):
  • A~U 전 키가 모두 존재하고 값이 전부 [] → "전 업종"
  • 특정 대분류들만 있고 값이 [] → 각 대분류명을 "/"로 연결
    - 예: {"C":[],"F":[]} → "제조업/건설업"
  • 한 대분류에 일부 중분류 코드가 있을 때:
    - 예: {"C":[10,11]} → "제조업(일부)"
    - 여러 대분류가 있고 일부만 중분류가 선택되면, 해당 대분류명 뒤에 "(일부)"를 붙인다.
      예: {"C":[10,11],"F":[]} → "제조업(일부)/건설업"
  • 대분류명은 아래와 같이 사용한다:
    - A: 농업·임업·어업
    - B: 광업
    - C: 제조업
    - D: 전기·가스·증기 및 공기 조절 공급업
    - E: 수도·하수 및 폐기물 처리·원료 재생업
    - F: 건설업
    - G: 도매·소매업
    - H: 운수·창고업
    - I: 숙박·음식점업
    - J: 정보통신업
    - K: 금융·보험업
    - L: 부동산업
    - M: 전문·과학·기술 서비스업
    - N: 사업시설 관리·사업 지원·임대 서비스업
    - O: 공공 행정·국방·사회보장 행정
    - P: 교육 서비스업
    - Q: 보건업·사회복지 서비스업
    - R: 예술·스포츠·여가관련 서비스업
    - S: 협회·단체·수리 및 기타 개인 서비스업
    - T: 가구 내 고용활동 및 자가 소비 생산활동
    - U: 국제 및 외국기관
 - 주체요약(cp_cd, 현황):
  • [0,1,2,3,4] → "주체 전체"
  • 그 외 → "일반법인/공공기관/비영리법인/개인/기타법인" 중 선택값을 "/"로 연결
    - 예: [0,3] → "일반법인/개인"
 - 규모요약(enp_cd):
  • [0,1,2,3] → "규모 전체"
  • 그 외 → "대기업/중소기업/중견기업/기타" 중 선택값을 "/"로 연결
    - 예: [0,1] → "대기업/중소기업"
 - 기업현황(category=6) summary 형식:
  • **"##YYYY년 M월 {region_nm} {업종요약} 기업현황({주체요약}/{규모요약})##"**
  • base_ym=0인 경우에는 "YYYY년 M월" 부분을 생략하고
    "##{region_nm} {업종요약} 기업현황({주체요약}/{규모요약})##" 형식을 사용한다.
 [금지]
 - null/None/"null"/불린/문자열 숫자/소수점 등 **타입 위반 금지**.
 - 허용 외 bzc_cd 키(A~U) 금지.
 - **[기업신용 업종분류] 표에 없는 중분류 코드 금지.**
 - category는 반드시 6이어야 하며, 다른 값 사용 금지.
 - flow_cd, flow_region_nm 등 전입/전출 관련 필드 사용 금지.
 - 두 개 이상의 JSON 객체를 동시에 출력하지 않는다.
 - 요약 텍스트(summary) 외에 추가 필드나 설명 문장, 주석을 JSON 밖에 쓰지 않는다.
 [기업신용 업종분류]
 A  농업, 임업 및 어업(01~03)
  1  농업
  2  임업
  3  어업
 B  광업(05~08)
  5  석탄, 원유 및 천연가스 광업
  6  금속 광업
  7  비금속광물 광업; 연료용 제외
  8  광업 지원 서비스업
 C  제조업(10~34)
 10  식료품 제조업
 11  음료 제조업
 12  담배 제조업
 13  섬유제품 제조업; 의복 제외
 14  의복, 의복 액세서리 및 모피제품 제조업
 15  가죽, 가방 및 신발 제조업
 16  목재 및 나무제품 제조업; 가구 제외
 17  펄프, 종이 및 종이제품 제조업
 18  인쇄 및 기록매체 복제업
 19  코크스, 연탄 및 석유정제품 제조업
 20  화학 물질 및 화학제품 제조업; 의약품 제외
 21  의료용 물질 및 의약품 제조업
 22  고무 및 플라스틱제품 제조업
 23  비금속 광물제품 제조업
 24  1차 금속 제조업
 25  금속 가공제품 제조업; 기계 및 가구 제외
 26  전자 부품, 컴퓨터, 영상, 음향 및 통신장비 제조업
 27  의료, 정밀, 광학 기기 및 시계 제조업
 28  전기장비 제조업
 29  기타 기계 및 장비 제조업
 30  자동차 및 트레일러 제조업
 31  기타 운송장비 제조업
 32  가구 제조업
 33  기타 제품 제조업
 34  산업용 기계 및 장비 수리업
 D  전기, 가스, 증기 및 공기 조절 공급업(35)
 35  전기, 가스, 증기 및 공기 조절 공급업
 E  수도, 하수 및 폐기물 처리, 원료 재생업(36~39)
 36  수도업
 37  하수, 폐수 및 분뇨 처리업
 38  폐기물 수집, 운반, 처리 및 원료 재생업
 39  환경 정화 및 복원업
 F  건설업(41~42)
 41  종합 건설업
 42  전문직별 공사업
 G  도매 및 소매업(45~47)
 45  자동차 및 부품 판매업
 46  도매 및 상품 중개업
 47  소매업; 자동차 제외
 H  운수 및 창고업(49~52)
 49  육상 운송 및 파이프라인 운송업
 50  수상 운송업
 51  항공 운송업
 52  창고 및 운송관련 서비스업
 I  숙박 및 음식점업(55~56)
 55  숙박업
 56  음식점 및 주점업
 J  정보통신업(58~63)
 58  출판업
 59  영상·오디오 기록물 제작 및 배급업
 60  방송 및 영상·오디오물 제공 서비스업
 61  우편 및 통신업
 62  컴퓨터 프로그래밍, 시스템 통합 및 관리업
 63  정보서비스업
 K  금융 및 보험업(64~66)
 64  금융업
 65  보험업
 66  금융 및 보험관련 서비스업
 L  부동산업(68)
 68  부동산업
 M  전문, 과학 및 기술 서비스업(70~73)
 70  연구개발업
 71  전문 서비스업
 72  건축 기술, 엔지니어링 및 기타 과학기술 서비스업
 73  기타 전문, 과학 및 기술 서비스업
 N  사업시설 관리, 사업 지원 및 임대 서비스업(74~76)
 74  사업시설 관리 및 조경 서비스업
 75  사업 지원 서비스업
 76  임대업; 부동산 제외
 O  공공 행정, 국방 및 사회보장 행정(84)
 84  공공 행정, 국방 및 사회보장 행정
 P  교육 서비스업(85)
 85  교육 서비스업
 Q  보건업 및 사회복지 서비스업(86~87)
 86  보건업
 87  사회복지 서비스업
 R  예술, 스포츠 및 여가관련 서비스업(90~91)
 90  창작, 예술 및 여가관련 서비스업
 91  스포츠 및 오락관련 서비스업
 S  협회 및 단체, 수리 및 기타 개인 서비스업(94~96)
 94  협회 및 단체
 95  개인 및 소비용품 수리업
 96  기타 개인 서비스업
 T  가구 내 고용활동 및 달리 분류되지 않은 자가 소비 생산활동(97~98)
 97  가구 내 고용활동
 98  달리 분류되지 않은 자가 소비를 위한 가구의 재화 및 서비스 생산활동
 U  국제 및 외국기관(99)
 99  국제 및 외국기관
 [예시 — 정답]
 - 입력: "부산 기업현황 2025년 6월, 제조업/건설업만"
  출력: {"summary":"##2025년 6월 부산광역시 제조업/건설업 기업현황(주체 전체/규모 전체)##","base_ym":202506,"region_nm":"부산광역시","bzc_cd":{"C":[],"F":[]},"cp_cd":[0,1,2,3,4],"enp_cd":[0,1,2,3],"category":6}
 - 입력: "부산 해운대구 4월, 식료품 제조업과 음료 제조업만, 대기업과 중소기업"
  출력: {"summary":"##2025년 4월 부산광역시 해운대구 제조업(일부) 기업현황(주체 전체/대기업/중소기업)##","base_ym":202504,"region_nm":"부산광역시 해운대구","bzc_cd":{"C":[10,11]},"cp_cd":[0,1,2,3,4],"enp_cd":[0,1],"category":6}
 - 입력: "전국 말고 부산시 전체, 업종은 도매 및 상품 중개업/소매업만 보고 싶다"
  출력: {"summary":"##부산광역시 도매·소매업(일부) 기업현황(주체 전체/규모 전체)##","base_ym":0,"region_nm":"부산광역시","bzc_cd":{"G":[46,47]},"cp_cd":[0,1,2,3,4],"enp_cd":[0,1,2,3],"category":6}
 - 입력: "부산광역시 기업 현황 보여줘"
  출력:{"summary":"##부산광역시 전 업종 기업현황(주체 전체/규모 전체)##","base_ym":0,"region_nm":"부산광역시","bzc_cd":{"A":[],"B":[],"C":[],"D":[],"E":[],"F":[],"G":[],"H":[],"I":[],"J":[],"K":[],"L":[],"M":[],"N":[],"O":[],"P":[],"Q":[],"R":[],"S":[],"T":[],"U":[]},"cp_cd":[0,1,2,3,4],"enp_cd":[0,1,2,3],"category":6}
 - 입력: "부산광역시 항공운송업에 대한 기업현황 보여줘, 일반법인의 중소기업과 중견기업 대상으로만"
  출력: {"summary":"##부산광역시 항공운송업 기업현황(일반법인/중소기업/중견기업)##","base_ym":0,"region_nm":"부산광역시","bzc_cd":{"H":[51]},"cp_cd":[0],"enp_cd":[1,2],"category":6}
 - 입력: "2025년 2월, 부산광역시 금속 광업/광업 지원 서비스업만 기업현황 보여줘, 기업규모는 중소기업"
  출력: {"summary":"##2025년 2월 부산광역시 광업 업종 기업현황(주체 전체/중소기업)##","base_ym":202502,"region_nm":"부산광역시","bzc_cd":{"B":[6,8]},"cp_cd":[0,1,2,3,4],"enp_cd":[1],"category":6}
 """
--- a/eval/results.md
+++ b/eval/results.md
@@ -0,0 +1,50 @@
 # Evaluation Results — DLM-NL2JSON-4B vs Baselines
 ## Test Configuration
 - **Test set**: `task_analysis_sft_251128_test.jsonl` (2,041 samples, 10 categories)
 - **Metric**: Field-level exact match accuracy (summary field excluded)
 - **Note**: 64 CSM samples with known gold label noise excluded in adjusted metrics (see below)
 - **Train/Test overlap**: 16/2,041 (0.78%) — retained for consistency across models
 ## Per-Category Accuracy
 | Category | N | DLM-NL2JSON-4B | GPT-4o | Qwen3.5-35B-A3B |
 |----------|---|-------------|--------|-----------------|
 | ALP-A (pattern) | 250 | **99.6%** | 56.0% | 47.6% |
 | ALP-B (flow) | 250 | **98.4%** | 50.4% | 46.8% |
 | CSM (consumption) | 700 | **90.6%** | 90.1% | 86.1% |
 | CREDIT-Income | 58 | **94.8%** | 53.4% | 34.5% |
 | CREDIT-Spending | 77 | **97.4%** | 92.2% | 51.9% |
 | CREDIT-Loan/Default | 73 | **98.6%** | 94.5% | 72.6% |
 | CPI (business) | 219 | 86.3% | **87.2%** | 54.8% |
 | GIS-Inflow | 72 | **97.2%** | 79.2% | 93.1% |
 | GIS-Outflow | 62 | **98.4%** | 77.4% | 98.4% |
 | GIS-Consumption | 280 | 98.2% | **99.6%** | 97.5% |
 ## Overall (Raw)
 | Model | Params | Accuracy | Avg Latency |
 |-------|--------|----------|-------------|
 | **DLM-NL2JSON-4B** | **4B** | **94.4% (1926/2041)** | 2.59s |
 | GPT-4o | ~200B+ | 80.5% (1643/2041) | 1.58s |
 | Qwen3.5-35B-A3B | 35B (3B active) | 72.2% (1473/2041) | 0.85s |
 ## Overall (Adjusted — 64 CSM gold noise samples excluded)
 | Model | Accuracy | N |
 |-------|----------|---|
 | **DLM-NL2JSON-4B** | **96.8% (1914/1977)** | 1977 |
 | GPT-4o | 82.5% (1631/1977) | 1977 |
 | Qwen3.5-35B-A3B | 73.9% (1461/1977) | 1977 |
 ## Hardware
 | Model | Serving | GPU |
 |-------|---------|-----|
 | DLM-NL2JSON-4B | vLLM (TensorRT-LLM) | NVIDIA L4 24GB |
 | GPT-4o | OpenAI API | N/A |
 | Qwen3.5-35B-A3B | vLLM | NVIDIA A6000 48GB |
 ## Notes
 - CSM gold noise: 64/700 CSM test samples have `age_cd` capped at 60 instead of 70 for "all ages" queries, conflicting with the prompt specification (`age_cd: [10,20,30,40,50,60,70]`). This affects all models equally.
 - DLM-NL2JSON-4B wins 8/10 categories outright, ties 1, and loses only CPI (86.3% vs GPT-4o 87.2%).
--- a/eval/test_data_2041.jsonl
+++ b/eval/test_data_2041.jsonl
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
 {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95,
  "transformers_version": "4.57.2"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00002.safetensors
+++ b/model-00001-of-00002.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5ec153b1f48531834ab0cbf4f98b7ae2866ca31e61a5d58d66e4b529691ddc17
 size 4965873920
--- a/model-00002-of-00002.safetensors
+++ b/model-00002-of-00002.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:20f500056ea33bf0dcd53db3e61bb57a9c2cab836bdd62c7f277030ba745a8b7
 size 3077766632
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,406 @@
 {
  "metadata": {
    "total_parameters": 4021797376,
    "total_size": 8043594752
  },
  "weight_map": {
    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.20.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.k_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.q_norm.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.k_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.q_norm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.norm.weight": "model-00002-of-00002.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
 {
  "additional_special_tokens": [
    "<TASK_CSM>",
    "<TASK_CREDIT>",
    "<TASK_GIS>",
    "<TASK_ALP>",
    "<TASK_CPI>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,271 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151665": {
      "content": "<tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151666": {
      "content": "</tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151667": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151668": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151669": {
      "content": "<TASK_CSM>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151670": {
      "content": "<TASK_CREDIT>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151671": {
      "content": "<TASK_GIS>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151672": {
      "content": "<TASK_ALP>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151673": {
      "content": "<TASK_CPI>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    }
  },
  "additional_special_tokens": [
    "<TASK_CSM>",
    "<TASK_CREDIT>",
    "<TASK_GIS>",
    "<TASK_ALP>",
    "<TASK_CPI>"
  ],
  "bos_token": null,
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "extra_special_tokens": {},
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/vocab.json
+++ b/vocab.json