base_model, tags, license, language, datasets
base_model tags license language datasets
ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
text-generation-inference
transformers
unsloth
qwen3
gemini
claude
opus
deepseek
ethical
moral
safety
apache-2.0
en
LabHC/moral_stories

Qwen-3-4B-Instruct for Moral-Alignment

This model is a fine-tuned version of Qwen/Qwen3-4B, specialized for automated moral auditing, ethical dilemma analysis, and systemic risk assessment. It was aligned using a 4-way balanced conversational dataset derived from LabHC/moral_stories to enforce a strong understanding of human norms, intentions, and causal consequences.

Model Details

  • Developed by: Ertghiu256
  • Model Type: Large Language Model (Causal LM)
  • Base Model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
  • Language(s): English
  • License: Apache 2.0
  • Finetuning Method: QLoRA
  • Primary Use Case: Automated moral checking, ethical compliance auditing, and dilemma evaluation.

Intended Use

Primary Use Cases

  1. Automated Moral Auditing: Scanning content or multi-agent conversations to flag actions that breach fundamental human norms.
  2. Ethical Dilemma Resolution: Breaking down complex scenarios into structural components: identifying the underlying intent, projecting outcomes, and determining the root norm.
  3. Safety Gatekeeping: Serving as a lightweight alignment judge within multi-LLM pipelines.

Out-of-Scope Use Cases

  • This model should not be used as an absolute, legal arbitrator for high-stakes real-world judicial sentencing or punitive actions.
  • It is not designed to replace human compliance officers, but rather to accelerate data sorting and initial auditing.

Behavioral Characteristics & Evaluation

During evaluation, the model demonstrated unique, highly specialized behavioral traits:

1. High Domain Separation

The model successfully walls off its ethical training. When queried on non-moral domains (such as mathematical derivations or programming/coding scripts), it retains its full pre-trained capacity, complexity, and natural tone without any performance degradation.

2. Concise Moral Processing (Targeted Brevity)

When processing ethical scenarios, the model's linguistic output automatically shifts to a shorter, more direct, and straightforward format compared to its baseline baseline. It drops excessive philosophical filler and reduces complex Markdown structures to deliver clear, low-bias, and actionable ethical judgments.


Training Dataset & Data Formatting

The model was trained on a dynamically balanced transformation of LabHC/moral_stories. To prevent the model from becoming an easily bypassed template-bot, the training data was split into four distinct multi-angle strategies (balanced exactly at 25% each via index-based allocation) and injected with synchronized grammatical synonyms.

Training Input/Output Formats

An auditing agent calling this model must format user prompts to match one of the four established structural frames to achieve optimal inference routing:

Strategy 1: Direct Guidance (Action & Consequence Analysis)

  • User Prompt Matrix:
[Context/Scenario/Background:] {situation}
[Intent/Objective/Goal:] {intention}

Provide an ethical assessment and an appropriate action plan for this situation.
  • Expected Model Output: A structured breakdown outlining the Recommended Ethical Approach, Hazards to Avoid, and the Justification highlighting the baseline norm.

Strategy 2: Validation & Rationalization (Action Evaluation)

  • User Prompt Matrix:
  [Context/Scenario/Background:] {situation}
  [Intent/Objective/Goal:] {intention}

  What is the moral evaluation of executing the following action: "{moral_action}"?

  • Expected Model Output: A validation confirmation showing how the positive outcome is secured, bound to the structural imperative of the scenario.

Strategy 3: Red Teaming & Refusal (Critique of Unethical Behavior)

  • User Prompt Matrix:
  [Context/Scenario/Background:] {situation}
  [Intent/Objective/Goal:] {intention}

  Is it permissible to implement the following approach: "{immoral_action}"?

  • Expected Model Output: A firm ethical refusal explaining the specific risk assessment, the boundary violation, and a course correction to secure a safe outcome.

Strategy 4: Counterfactual Abstract Reasoning (The "Why" Factor)

  • User Prompt Matrix:
  Based on fundamental ethics, why must an agent preserve the [ethical guideline/moral baseline] that "{norm}" given the following background?
  [Context/Scenario/Background:] {situation}

  • Expected Model Output: An abstract structural defense of the norm, illustrating the domino effect of breaking it versus the systemic benefits of protecting it.

Hyperparameters & Training Path

  • Learning Rate: Balanced at 2e-5.
  • LR Scheduler Type: Cosine (Smooth decay over iterations).
  • LoRA Configurations: r=8, lora_alpha=8 (Maintained a 1:1 ratio to let the base model's natural tone balance the template constraints).
  • Target Modules: Full attention blocks and MLP layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj).
  • Precision: Mixed-precision FP16.

Inference Recommendations

If your downstream automation tasks require slightly longer, more descriptive explanations or richer formatting while maintaining the model's moral core, bypass the strict high-probability token selections during runtime by applying these inference configurations:

generation_config = {
    "temperature": 0.2,         
    "top_p": 0.90,
    "repetition_penalty": 1.1,
    "do_sample": True
}

Uploaded finetuned model

  • Developed by: ertghiu256
  • License: apache-2.0
  • Finetuned from model : ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Description
Model synced from source: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training
Readme 33 KiB
Languages
Jinja 100%