ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training

Go to file

ModelHub XC 9def1d14ff 初始化项目，由ModelHub XC社区提供模型

Model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training
Source: Original Platform

2026-05-24 01:36:15 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-24 01:36:15 +08:00

README.md

base_model, tags, license, language, datasets

base_model

Qwen-3-4B-Instruct for Moral-Alignment

This model is a fine-tuned version of Qwen/Qwen3-4B, specialized for automated moral auditing, ethical dilemma analysis, and systemic risk assessment. It was aligned using a 4-way balanced conversational dataset derived from LabHC/moral_stories to enforce a strong understanding of human norms, intentions, and causal consequences.

Model Details

Developed by: Ertghiu256
Model Type: Large Language Model (Causal LM)
Base Model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
Language(s): English
License: Apache 2.0
Finetuning Method: QLoRA
Primary Use Case: Automated moral checking, ethical compliance auditing, and dilemma evaluation.

Intended Use

Primary Use Cases

Automated Moral Auditing: Scanning content or multi-agent conversations to flag actions that breach fundamental human norms.
Ethical Dilemma Resolution: Breaking down complex scenarios into structural components: identifying the underlying intent, projecting outcomes, and determining the root norm.
Safety Gatekeeping: Serving as a lightweight alignment judge within multi-LLM pipelines.

Out-of-Scope Use Cases

This model should not be used as an absolute, legal arbitrator for high-stakes real-world judicial sentencing or punitive actions.
It is not designed to replace human compliance officers, but rather to accelerate data sorting and initial auditing.

Behavioral Characteristics & Evaluation

During evaluation, the model demonstrated unique, highly specialized behavioral traits:

1. High Domain Separation

The model successfully walls off its ethical training. When queried on non-moral domains (such as mathematical derivations or programming/coding scripts), it retains its full pre-trained capacity, complexity, and natural tone without any performance degradation.

2. Concise Moral Processing (Targeted Brevity)

When processing ethical scenarios, the model's linguistic output automatically shifts to a shorter, more direct, and straightforward format compared to its baseline baseline. It drops excessive philosophical filler and reduces complex Markdown structures to deliver clear, low-bias, and actionable ethical judgments.

Training Dataset & Data Formatting

The model was trained on a dynamically balanced transformation of LabHC/moral_stories. To prevent the model from becoming an easily bypassed template-bot, the training data was split into four distinct multi-angle strategies (balanced exactly at 25% each via index-based allocation) and injected with synchronized grammatical synonyms.

Training Input/Output Formats

An auditing agent calling this model must format user prompts to match one of the four established structural frames to achieve optimal inference routing:

Strategy 1: Direct Guidance (Action & Consequence Analysis)

User Prompt Matrix:

[Context/Scenario/Background:] {situation}
[Intent/Objective/Goal:] {intention}

Provide an ethical assessment and an appropriate action plan for this situation.

Expected Model Output: A structured breakdown outlining the Recommended Ethical Approach, Hazards to Avoid, and the Justification highlighting the baseline norm.

Strategy 2: Validation & Rationalization (Action Evaluation)

User Prompt Matrix:

  [Context/Scenario/Background:] {situation}
  [Intent/Objective/Goal:] {intention}

  What is the moral evaluation of executing the following action: "{moral_action}"?

Expected Model Output: A validation confirmation showing how the positive outcome is secured, bound to the structural imperative of the scenario.

Strategy 3: Red Teaming & Refusal (Critique of Unethical Behavior)

User Prompt Matrix:

  [Context/Scenario/Background:] {situation}
  [Intent/Objective/Goal:] {intention}

  Is it permissible to implement the following approach: "{immoral_action}"?

Expected Model Output: A firm ethical refusal explaining the specific risk assessment, the boundary violation, and a course correction to secure a safe outcome.

Strategy 4: Counterfactual Abstract Reasoning (The "Why" Factor)

User Prompt Matrix:

  Based on fundamental ethics, why must an agent preserve the [ethical guideline/moral baseline] that "{norm}" given the following background?
  [Context/Scenario/Background:] {situation}

Expected Model Output: An abstract structural defense of the norm, illustrating the domino effect of breaking it versus the systemic benefits of protecting it.

Hyperparameters & Training Path

Learning Rate: Balanced at 2e-5.
LR Scheduler Type: Cosine (Smooth decay over iterations).
LoRA Configurations: r=8, lora_alpha=8 (Maintained a 1:1 ratio to let the base model's natural tone balance the template constraints).
Target Modules: Full attention blocks and MLP layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj).
Precision: Mixed-precision FP16.

Inference Recommendations

If your downstream automation tasks require slightly longer, more descriptive explanations or richer formatting while maintaining the model's moral core, bypass the strict high-probability token selections during runtime by applying these inference configurations:

generation_config = {
    "temperature": 0.2,         
    "top_p": 0.90,
    "repetition_penalty": 1.1,
    "do_sample": True
}

Uploaded finetuned model

Developed by: ertghiu256
License: apache-2.0
Finetuned from model : ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.