--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/structured_data_with_cot_dataset_512_v2 language: - en license: apache-2.0 pipeline_tag: text-generation tags: - structured-output - merged-weights - sft - qlora --- qwen3-4b-structured-output-merged-stage-a This repository provides a **merged (fully materialized) model** derived from **Qwen/Qwen3-4B-Instruct-2507**. The weights were obtained by training a **LoRA adapter** and then **merging the adapter into the base model weights** (merge-and-unload). ✅ **You can load this model directly with `AutoModelForCausalLM.from_pretrained()`** ❌ This is **NOT** an adapter-only repository. ## What this model is for (StageA) This model corresponds to **StageA** in a two-stage training procedure. **StageA goal:** stabilize *output mode* for structured generation: - reduce non-structured preambles (e.g., "Here/Sure") - reduce code-fences (```json / ```xml / ```yaml) - output only the required structured format reliably This merged model is intended to be used as a stable starting point for StageB (TOML failure-pattern mitigation) without drifting back to chatty preambles. ## Training Objective Improve **structured output reliability** (JSON / YAML / XML / TOML / CSV), especially eliminating non-structured preambles that break parsers. ## Training Configuration (StageA) - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: QLoRA (4-bit) with LoRA adapter, then merged into base weights - Max sequence length: 1024 - Training length: 1 epoch(s) (or step-limited, if configured) - Learning rate: 2e-05 - LoRA: r=8, alpha=16 Note: In StageA, loss is applied to the full assistant output to suppress preambles (if you used full-loss). If you used output-only loss, replace this sentence accordingly. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "your_id/your-repo" # this repo tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, # or float16 depending on your environment device_map="auto", ) ``` ## Compliance / Notes This model is derived only from the organizer-approved base model (Qwen/Qwen3-4B-Instruct-2507) and uses no architecture changes. The merge operation is used only to integrate post-training results (SFT/LoRA) under the same architecture. ## Sources & Terms (IMPORTANT) Training data: u-10bei/structured_data_with_cot_dataset_512_v2 Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.