commit 51b1b8256ffbe9a0a060755db73b9808ddfd8788
Author: ModelHub XC <noreply@modelhub.org.cn>
Date:   Mon Apr 27 04:10:46 2026 +0800

    初始化项目，由ModelHub XC社区提供模型
    
    Model: stratosphere/qwen2.5-1.5b-slips-immune-risk
    Source: Original Platform

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..52373fe
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,36 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..408ba2d
--- /dev/null
+++ b/README.md
@@ -0,0 +1,285 @@
+---
+base_model: unsloth/Qwen2.5-1.5B-Instruct
+datasets:
+- stratosphere/immune-risk-sft-dataset
+library_name: transformers
+model_name: qwen2.5-1.5b-slips-immune-risk
+tags:
+- base_model:finetune:unsloth/Qwen2.5-1.5B-Instruct
+- network-security
+- ids
+- slips
+- risk-assessment
+- cause-analysis
+- cybersecurity
+- lora
+- sft
+- transformers
+- trl
+- unsloth
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+---
+
+# Qwen2.5-1.5B — Slips IDS Cause Analysis & Risk Assessment
+
+## Model Description
+
+A fine-tuned version of [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) specialized for dual-task analysis of network security incidents from [Slips IDS](https://github.com/stratosphereips/StratosphereLinuxIPS):
+
+1. **Cause Analysis** — identifying the most likely cause of the incident (malicious activity, misconfiguration, or legitimate behavior) with structured reasoning and alternative hypotheses
+2. **Risk Assessment** — producing a calibrated risk level, business impact statement, likelihood of malicious activity, and investigation priority
+
+Slips is a network intrusion detection system that generates DAG-structured alert logs — chains of related security events per source IP per time window. This model takes those raw DAG logs and produces two complementary analyses that help analysts understand *why* an incident occurred and *how urgently* it should be investigated.
+
+The model was fine-tuned using SFT (Supervised Fine-Tuning) on a **combined cause+risk dataset** with best-of-N response selection: for each training incident, the highest-scoring response among GPT-4o, GPT-4o-mini, Qwen2.5 3B, and Qwen2.5 1.5B (judged by an LLM-as-judge) was selected as ground truth. A single LoRA adapter handles both task types.
+
+## Quick Start
+
+### Ollama (Recommended)
+
+Quantized GGUF models are the recommended way to run this model locally. Three quantization levels are available:
+
+```bash
+# q4_k_m — smallest, fastest (recommended for most use cases)
+ollama run stratosphere/qwen2.5-1.5b-slips-immune-risk:q4_k_m
+
+# q5_k_m — balanced quality/size
+ollama run stratosphere/qwen2.5-1.5b-slips-immune-risk:q5_k_m
+
+# q8_0 — highest quality quantized version
+ollama run stratosphere/qwen2.5-1.5b-slips-immune-risk:q8_0
+```
+
+All three tags are available on [Ollama Hub](https://ollama.com/stratosphere/qwen2.5-1.5b-slips-immune-risk). Use the prompts in the Python section below to structure your queries.
+
+### Python (Transformers)
+
+The model uses two distinct prompt formats — one for cause analysis and one for risk assessment — applied to the same incident DAG.
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+
+model_id = "stratosphere/qwen2.5-1.5b-slips-immune-risk"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
+
+dag_analysis = """
+...  # paste your Slips DAG analysis here
+"""
+
+incident = {
+    "incident_id": "abc123",
+    "source_ip": "192.168.1.100",
+    "timewindow": "5",
+    "threat_level": 8.5,
+    "timeline": "2024-01-15 14:00:00 to 2024-01-15 15:00:00",
+    "event_count": 42,
+    "dag_analysis": dag_analysis,
+}
+
+# --- Task 1: Cause Analysis ---
+cause_prompt = f"""You are a cybersecurity analyst. Analyze the following network security incident and provide a structured analysis of possible causes.
+
+INCIDENT METADATA:
+- Incident ID: {incident['incident_id']}
+- Source IP: {incident['source_ip']}
+- Timewindow: {incident['timewindow']}
+- Accumulated Threat Level: {incident['threat_level']}
+- Time Range: {incident['timeline']}
+- Total Events: {incident['event_count']}
+
+SECURITY EVIDENCE:
+{incident['dag_analysis']}
+
+Output Requirements:
+- Respond with ONLY the analysis content
+- Do NOT include any prefixes (like "AI:"), statistics, or metadata
+- Do NOT include token counts, timing information, or performance stats
+- Use this exact structure:
+
+**Possible Causes:**
+
+**1. Malicious Activity:**
+• [Specific attack technique or malicious cause]
+• [Additional malicious possibilities if relevant]
+
+**2. Legitimate Activity:**
+• [Benign operational cause]
+• [Additional legitimate possibilities if relevant]
+
+**3. Misconfigurations:**
+• [Technical misconfigurations that could cause this behavior]
+
+**Conclusion:** [1-2 sentence assessment of most likely cause category with recommendation for further investigation]
+
+Guidelines:
+- Be succinct (fewer words than raw evidence)
+- Focus on relevant causes only (attack techniques, misconfigurations, legitimate operations)
+- Use precise analyst-level language
+- Maintain consistent structure and depth across all analyses
+- Avoid generic definitions or unnecessary context"""
+
+# --- Task 2: Risk Assessment ---
+risk_prompt = f"""You are a cybersecurity analyst. Analyze the following network security incident and provide a structured risk assessment.
+
+INCIDENT METADATA:
+- Incident ID: {incident['incident_id']}
+- Source IP: {incident['source_ip']}
+- Timewindow: {incident['timewindow']}
+- Accumulated Threat Level: {incident['threat_level']}
+- Time Range: {incident['timeline']}
+- Total Events: {incident['event_count']}
+
+SECURITY EVIDENCE:
+{incident['dag_analysis']}
+
+Output Requirements:
+- Respond with ONLY the assessment content
+- Do NOT include any prefixes (like "AI:"), statistics, or metadata
+- Do NOT include token counts, timing information, or performance stats
+- Use this exact structure:
+
+**Risk Level:** [Critical/High/Medium/Low]
+
+**Justification:** [1-2 sentence technical justification for the risk level]
+
+**Business Impact:** [Single clear sentence describing the most relevant business effect]
+
+**Likelihood of Malicious Activity:** [High/Medium/Low] - [Brief rationale]
+
+**Investigation Priority:** [Immediate/High/Medium/Low] - [Brief justification]
+
+Guidelines:
+- Use only the four risk levels: Critical, High, Medium, Low
+- Keep justifications concise and technical
+- Focus business impact on most relevant effect (data access, service disruption, etc.)
+- Use consistent language for likelihood assessments
+- Maintain uniform structure and depth across all assessments"""
+
+for prompt in [cause_prompt, risk_prompt]:
+    messages = [{"role": "user", "content": prompt}]
+    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
+    output = model.generate(input_ids, max_new_tokens=1024, do_sample=False)
+    print(tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True))
+    print("\n" + "="*60 + "\n")
+```
+
+## Training Details
+
+### Dataset
+
+The training data is publicly available at [stratosphere/immune-risk-sft-dataset](https://huggingface.co/datasets/stratosphere/immune-risk-sft-dataset).
+
+- **Source**: 826 incidents from real Slips IDS network captures, filtered by quality (cause score ≥ 14, risk score ≥ 10, token length checks)
+- **Responses**: 4 model responses per incident per task (GPT-4o, GPT-4o-mini, Qwen2.5 3B, Qwen2.5 1.5B) scored by an LLM-as-judge
+- **Selection**: Best-of-N — highest-scoring response per incident per task used as training target
+- **Combined dataset**: cause and risk records interleaved so the model sees both task types throughout training (1328 train / 148 eval records)
+- **Split**: 90% train / 10% eval
+
+### Training Procedure
+
+| Parameter | Value |
+|-----------|-------|
+| Base model | `unsloth/Qwen2.5-1.5B-Instruct` |
+| Training method | SFT (Supervised Fine-Tuning) |
+| Framework | Unsloth + TRL SFTTrainer |
+| LoRA rank (`r`) | 64 |
+| LoRA alpha | 64 |
+| LoRA dropout | 0.0 |
+| RSLoRA | enabled (required at r=64) |
+| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Sequence length | 4096 |
+| Batch size | 1 (effective: 16 via gradient accumulation) |
+| Gradient accumulation steps | 16 |
+| Learning rate | 2e-5 |
+| LR scheduler | cosine |
+| Warmup steps | 20 |
+| Weight decay | 0.01 |
+| Epochs | 3 |
+| Optimizer | adamw_8bit |
+| Precision | BF16 |
+| Quantization (training) | 4bit (QLoRA) |
+| Response masking | `train_on_responses_only` — loss computed on assistant turns only |
+
+### Framework Versions
+
+- Unsloth: 2026.3.18
+- Transformers: (auto-detected)
+- PyTorch: (auto-detected)
+
+## Evaluation
+
+Evaluated on 67 held-out Slips IDS incidents using `qwen3.5` as an independent LLM-as-judge. The judge ranked all 5 model responses per incident simultaneously, scoring cause analysis and risk assessment separately. Model labels were randomized per incident to prevent position bias. Inference was performed at 4096 max input tokens, 1024 max output tokens.
+
+### Overall Results
+
+| Rank | Model | Avg Position | Avg Cause Score | Avg Risk Score | Win Rate | Wins |
+|------|-------|--------------|-----------------|----------------|----------|------|
+| 1 | GPT-4o | 1.70 | 15.33 | 11.99 | 40.3% | 27 |
+| 2 | **Qwen2.5-1.5B (finetuned)** | **1.73** | **15.58** | **10.27** | **37.3%** | **25** |
+| 3 | GPT-4o-mini | 2.11 | 15.31 | 11.63 | 19.4% | 13 |
+| 4 | Qwen2.5 1.5B (baseline) | 3.48 | 9.15 | 8.79 | 3.0% | 2 |
+| 5 | Qwen2.5 3B (baseline) | 3.53 | 7.40 | 9.61 | 0.0% | 0 |
+
+The finetuned 1.5B model is nearly tied with GPT-4o on overall ranking (avg position 1.73 vs 1.70) and **beats GPT-4o on cause analysis score** (15.58 vs 15.33). Win rate of 37.3% substantially outperforms GPT-4o-mini (19.4%) and both untuned baselines.
+
+### By Complexity
+
+| Complexity | Events | Cause Score | Risk Score | Win Rate |
+|------------|--------|-------------|------------|----------|
+| Simple | < 500 (33 incidents) | 15.70 | 9.32 | 54.5% |
+| Medium | 500–1999 (8 incidents) | 19.38 | 12.62 | 50.0% |
+| Complex | ≥ 2000 (11 incidents) | 13.20 | 11.80 | 27.3% |
+
+Strong on simple and medium incidents. Performance drops on complex incidents (≥ 2000 events), consistent with DAG truncation at the 4096-token input limit.
+
+### By Category
+
+| Category | Count | Cause Score | Risk Score | Win Rate |
+|----------|-------|-------------|------------|----------|
+| Malware | 47 | 15.52 | 10.08 | 51.1% |
+| Normal | 5 | 16.40 | 12.60 | 20.0% |
+
+### Known Limitations
+
+- **Risk scores lag cause scores**: cause avg 15.58 vs risk avg 10.27 — the model is stronger at identifying causes than calibrating risk levels. This reflects a task imbalance in the training data rather than a fundamental model limitation.
+- **Complex incidents**: performance drops on incidents with ≥ 2000 events due to DAG truncation at the sequence length limit.
+- **Normal traffic**: only 5 Normal incidents in the eval set — results for that category are not statistically reliable.
+
+## Intended Use
+
+- Automated cause analysis of Slips IDS alerts for security analysts
+- Risk prioritization and triage of network incidents
+- Input to downstream reporting or ticketing workflows
+
+## Out-of-Scope Use
+
+- General-purpose chat or instruction following
+- Security domains outside network IDS (malware analysis, vulnerability scanning, etc.)
+- Non-English inputs
+
+## Citation
+
+```bibtex
+@misc{qwen2.5-1.5b-slips-immune-risk,
+  title        = {Qwen2.5-1.5B fine-tuned for Slips IDS cause analysis and risk assessment},
+  author       = {Stratosphere Laboratory, CTU Prague},
+  year         = {2026},
+  howpublished = {\url{https://huggingface.co/stratosphere/qwen2.5-1.5b-slips-immune-risk}}
+}
+```
+
+## Acknowledgments
+
+This work was supported by the [NLnet Foundation](https://nlnet.nl) as part of the IMMUNE project. NLnet Foundation promotes open internet standards and open source software.
+
+## Model Details
+
+- **Model size**: 1.5B params
+- **Tensor type**: FP16
+- **License**: Apache-2.0
+- **Tags**: Text Generation, Transformers, Safetensors, Network Security, IDS, SLIPS, Risk Assessment, Cause Analysis, Cybersecurity, LoRA, SFT, TRL, Unsloth
diff --git a/chat_template.jinja b/chat_template.jinja
new file mode 100644
index 0000000..642e597
--- /dev/null
+++ b/chat_template.jinja
@@ -0,0 +1,53 @@
+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}
diff --git a/config.json b/config.json
new file mode 100644
index 0000000..2c78402
--- /dev/null
+++ b/config.json
@@ -0,0 +1,62 @@
+{
+    "architectures": [
+        "Qwen2ForCausalLM"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": null,
+    "torch_dtype": "bfloat16",
+    "eos_token_id": 151645,
+    "hidden_act": "silu",
+    "hidden_size": 1536,
+    "initializer_range": 0.02,
+    "intermediate_size": 8960,
+    "layer_types": [
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention"
+    ],
+    "max_position_embeddings": 32768,
+    "max_window_layers": 21,
+    "model_type": "qwen2",
+    "num_attention_heads": 12,
+    "num_hidden_layers": 28,
+    "num_key_value_heads": 2,
+    "pad_token_id": 151665,
+    "rms_norm_eps": 1e-06,
+    "rope_parameters": {
+        "rope_theta": 1000000.0,
+        "rope_type": "default"
+    },
+    "sliding_window": null,
+    "tie_word_embeddings": true,
+    "unsloth_fixed": true,
+    "unsloth_version": "2026.4.6",
+    "use_cache": false,
+    "use_sliding_window": false,
+    "vocab_size": 151936
+}
\ No newline at end of file
diff --git a/config.yaml b/config.yaml
new file mode 100644
index 0000000..699a8b5
--- /dev/null
+++ b/config.yaml
@@ -0,0 +1,125 @@
+# Qwen Fine-tuning Configuration with Unsloth
+# Risk analysis (cause + risk combined adapter) — Qwen2.5-3B, 4096 seq_len, 20GB VRAM
+
+# Model Configuration
+model:
+  model_name: "unsloth/Qwen2.5-1.5B-Instruct"  # Target deployment model (RPi5)
+  max_seq_length: 4096 # 3500 DAG tokens + 183 prompt overhead + ~400 response tokens
+  dtype: null  # Auto-detect best dtype
+  load_in_4bit: true  # QLoRA — 4-bit base model required for 20GB VRAM
+  device_map: "auto"  # Automatic device mapping
+
+  # LoRA Configuration
+  lora_r: 64  # Increased from 16 — wider subspace helps model learn synthesis over verbatim copying
+  lora_alpha: 64  # Keep equal to r with RSLoRA
+  lora_dropout: 0.0  # No dropout — 461 samples, ~174 optimizer steps; every gradient counts
+  lora_targets:  # Target modules for LoRA
+    - "q_proj"
+    - "k_proj"
+    - "v_proj"
+    - "o_proj"
+    - "gate_proj"
+    - "up_proj"
+    - "down_proj"
+  use_rslora: true  # Mandatory at r=64 to normalize gradient scaling
+  random_state: 42  # Random seed for reproducibility
+  loftq_config: null  # LoftQ configuration
+
+# Dataset Configuration
+dataset:
+  type: "local"  # Options: "huggingface", "local"
+  name: "mixed_dataset"  # Hugging Face dataset name (if type is huggingface)
+  path: "risk_combined_train_dataset.json"  # Combined interleaved cause+risk train split (1328 records)
+  eval_path: "risk_combined_eval_dataset.json"  # Combined interleaved cause+risk eval split (148 records)
+  split: "train"  # Dataset split to use
+  text_column: "messages"  # Column name containing conversations
+  use_chat_template: true  # Apply chat template formatting
+  dpo_train_path: "dpo_train_dataset.json"
+  dpo_eval_path:  "dpo_eval_dataset.json"
+
+# Training Configuration
+training:
+  mode: "sft"  # Options: "sft", "dpo", "orpo"
+
+  # Batch size and accumulation
+  per_device_train_batch_size: 1  # 8192 seq len requires batch=1 to avoid OOM
+  gradient_accumulation_steps: 16  # effective batch size = 16
+
+  # Learning rate and schedule
+  learning_rate: 0.00002  # 2e-5 — RSLoRA stability allows higher LR than r=16 config
+  lr_scheduler_type: "cosine"  # Learning rate scheduler
+  warmup_steps: 20  # 11% of ~174 steps
+  weight_decay: 0.01  # Weight decay
+
+  # Training duration
+  num_train_epochs: 3  # Number of training epochs
+  max_steps: -1  # Maximum training steps (-1 for full epochs)
+
+  # Precision and optimization
+  fp16: false  # Use BF16 instead (Ampere GPU assumed)
+  bf16: true   # BF16 — larger dynamic range than FP16, no overflow risk
+  optimizer: "adamw_8bit"  # 8-bit optimizer — keeps optimizer states within 20GB budget
+
+  # Logging and saving
+  logging_steps: 1  # Logging frequency
+  save_steps: 50  # Model save frequency
+  save_total_limit: 2  # Maximum number of saved checkpoints
+
+  # Output directory
+  output_dir: "./qwen_risk_finetuned"  # Output directory for model and checkpoints
+
+  # Data processing
+  dataset_num_proc: 2  # Number of processes for dataset processing
+  dataloader_num_workers: 0  # Number of dataloader workers
+  packing: false  # Must be false when using train_on_responses_only
+
+  # Reporting
+  report_to: []  # Reporting services (wandb, tensorboard, etc.)
+
+  # Model saving format
+  save_method: "merged_16bit"  # Options: "lora", "merged_16bit", "merged_4bit"
+  gguf_quantization: "q5_k_m"  # Also export GGUF for Ollama. Options: q4_k_m, q5_k_m, q8_0, f16. Set to null to skip.
+
+  # Reproducibility
+  seed: 42  # Random seed
+
+# DPO / ORPO Configuration
+dpo:
+  beta: 0.1             # KL penalty coefficient for DPO (standard starting point)
+  orpo_lambda: 0.1      # ORPO odds-ratio weight (same scale as DPO beta)
+  dpo_learning_rate: 0.00005  # Lower LR than SFT — DPO is sensitive to overshooting
+
+# Weights & Biases Configuration
+use_wandb: false  # Enable W&B logging
+wandb:
+  project: "qwen-finetuning"  # W&B project name
+  run_name: "qwen-dpo-stage2"  # W&B run name
+  tags: ["qwen", "unsloth", "lora"]  # W&B tags
+
+# Hardware-specific configurations
+hardware:
+  # For different GPU memory configurations
+  gpu_16gb:
+    model_name: "unsloth/Qwen1.5-3B"
+    per_device_train_batch_size: 2
+    gradient_accumulation_steps: 4
+    max_seq_length: 2048
+
+  gpu_24gb:
+    model_name: "unsloth/Qwen1.5-3B"
+    per_device_train_batch_size: 4
+    gradient_accumulation_steps: 2
+    max_seq_length: 4096
+
+  gpu_40gb:
+    model_name: "unsloth/Qwen1.5-3B"
+    per_device_train_batch_size: 2
+    gradient_accumulation_steps: 4
+    max_seq_length: 4096
+
+# Evaluation Configuration (optional)
+evaluation:
+  eval_steps: 50  # Evaluation frequency (reduced for small dataset)
+  metric_for_best_model: "loss"  # Metric to track for best model
+  load_best_model_at_end: true  # Load best model at end of training
+  save_total_limit: 2  # Keep only 2 checkpoints to save disk
diff --git a/model.safetensors b/model.safetensors
new file mode 100644
index 0000000..aac4683
--- /dev/null
+++ b/model.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:31bc62255f191f7dcbb8efaceca4f93dc2530eb05d22f38a564d0684957a6e8e
+size 3087467144
diff --git a/tokenizer.json b/tokenizer.json
new file mode 100644
index 0000000..5340d81
--- /dev/null
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:bd5948af71b4f56cf697f7580814c7ce8b80595ef985544efcacf716126a2e31
+size 11422356
diff --git a/tokenizer_config.json b/tokenizer_config.json
new file mode 100644
index 0000000..d67eb78
--- /dev/null
+++ b/tokenizer_config.json
@@ -0,0 +1,16 @@
+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "is_local": false,
+  "model_max_length": 32768,
+  "pad_token": "<|PAD_TOKEN|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n"
+}
\ No newline at end of file