Model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training Source: Original Platform
base_model, tags, license, language, datasets
| base_model | tags | license | language | datasets | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini |
|
apache-2.0 |
|
|
Qwen-3-4B-Instruct for Moral-Alignment
This model is a fine-tuned version of Qwen/Qwen3-4B, specialized for automated moral auditing, ethical dilemma analysis, and systemic risk assessment. It was aligned using a 4-way balanced conversational dataset derived from LabHC/moral_stories to enforce a strong understanding of human norms, intentions, and causal consequences.
Model Details
- Developed by: Ertghiu256
- Model Type: Large Language Model (Causal LM)
- Base Model:
ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini - Language(s): English
- License: Apache 2.0
- Finetuning Method: QLoRA
- Primary Use Case: Automated moral checking, ethical compliance auditing, and dilemma evaluation.
Intended Use
Primary Use Cases
- Automated Moral Auditing: Scanning content or multi-agent conversations to flag actions that breach fundamental human norms.
- Ethical Dilemma Resolution: Breaking down complex scenarios into structural components: identifying the underlying intent, projecting outcomes, and determining the root norm.
- Safety Gatekeeping: Serving as a lightweight alignment judge within multi-LLM pipelines.
Out-of-Scope Use Cases
- This model should not be used as an absolute, legal arbitrator for high-stakes real-world judicial sentencing or punitive actions.
- It is not designed to replace human compliance officers, but rather to accelerate data sorting and initial auditing.
Behavioral Characteristics & Evaluation
During evaluation, the model demonstrated unique, highly specialized behavioral traits:
1. High Domain Separation
The model successfully walls off its ethical training. When queried on non-moral domains (such as mathematical derivations or programming/coding scripts), it retains its full pre-trained capacity, complexity, and natural tone without any performance degradation.
2. Concise Moral Processing (Targeted Brevity)
When processing ethical scenarios, the model's linguistic output automatically shifts to a shorter, more direct, and straightforward format compared to its baseline baseline. It drops excessive philosophical filler and reduces complex Markdown structures to deliver clear, low-bias, and actionable ethical judgments.
Training Dataset & Data Formatting
The model was trained on a dynamically balanced transformation of LabHC/moral_stories. To prevent the model from becoming an easily bypassed template-bot, the training data was split into four distinct multi-angle strategies (balanced exactly at 25% each via index-based allocation) and injected with synchronized grammatical synonyms.
Training Input/Output Formats
An auditing agent calling this model must format user prompts to match one of the four established structural frames to achieve optimal inference routing:
Strategy 1: Direct Guidance (Action & Consequence Analysis)
- User Prompt Matrix:
[Context/Scenario/Background:] {situation}
[Intent/Objective/Goal:] {intention}
Provide an ethical assessment and an appropriate action plan for this situation.
- Expected Model Output: A structured breakdown outlining the Recommended Ethical Approach, Hazards to Avoid, and the Justification highlighting the baseline norm.
Strategy 2: Validation & Rationalization (Action Evaluation)
- User Prompt Matrix:
[Context/Scenario/Background:] {situation}
[Intent/Objective/Goal:] {intention}
What is the moral evaluation of executing the following action: "{moral_action}"?
- Expected Model Output: A validation confirmation showing how the positive outcome is secured, bound to the structural imperative of the scenario.
Strategy 3: Red Teaming & Refusal (Critique of Unethical Behavior)
- User Prompt Matrix:
[Context/Scenario/Background:] {situation}
[Intent/Objective/Goal:] {intention}
Is it permissible to implement the following approach: "{immoral_action}"?
- Expected Model Output: A firm ethical refusal explaining the specific risk assessment, the boundary violation, and a course correction to secure a safe outcome.
Strategy 4: Counterfactual Abstract Reasoning (The "Why" Factor)
- User Prompt Matrix:
Based on fundamental ethics, why must an agent preserve the [ethical guideline/moral baseline] that "{norm}" given the following background?
[Context/Scenario/Background:] {situation}
- Expected Model Output: An abstract structural defense of the norm, illustrating the domino effect of breaking it versus the systemic benefits of protecting it.
Hyperparameters & Training Path
- Learning Rate: Balanced at
2e-5. - LR Scheduler Type: Cosine (Smooth decay over iterations).
- LoRA Configurations:
r=8,lora_alpha=8(Maintained a1:1ratio to let the base model's natural tone balance the template constraints). - Target Modules: Full attention blocks and MLP layers (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj). - Precision: Mixed-precision FP16.
Inference Recommendations
If your downstream automation tasks require slightly longer, more descriptive explanations or richer formatting while maintaining the model's moral core, bypass the strict high-probability token selections during runtime by applying these inference configurations:
generation_config = {
"temperature": 0.2,
"top_p": 0.90,
"repetition_penalty": 1.1,
"do_sample": True
}
Uploaded finetuned model
- Developed by: ertghiu256
- License: apache-2.0
- Finetuned from model : ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
