初始化项目,由ModelHub XC社区提供模型
Model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini-ethical-training Source: Original Platform
This commit is contained in:
159
README.md
Normal file
159
README.md
Normal file
@@ -0,0 +1,159 @@
|
||||
---
|
||||
base_model: ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
|
||||
tags:
|
||||
- text-generation-inference
|
||||
- transformers
|
||||
- unsloth
|
||||
- qwen3
|
||||
- gemini
|
||||
- claude
|
||||
- opus
|
||||
- deepseek
|
||||
- ethical
|
||||
- moral
|
||||
- safety
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
datasets:
|
||||
- LabHC/moral_stories
|
||||
---
|
||||
|
||||
# Qwen-3-4B-Instruct for Moral-Alignment
|
||||
|
||||
This model is a fine-tuned version of `Qwen/Qwen3-4B`, specialized for automated moral auditing, ethical dilemma analysis, and systemic risk assessment. It was aligned using a 4-way balanced conversational dataset derived from `LabHC/moral_stories` to enforce a strong understanding of human norms, intentions, and causal consequences.
|
||||
|
||||
## Model Details
|
||||
|
||||
* **Developed by:** Ertghiu256
|
||||
* **Model Type:** Large Language Model (Causal LM)
|
||||
* **Base Model:** `ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini`
|
||||
* **Language(s):** English
|
||||
* **License:** Apache 2.0
|
||||
* **Finetuning Method:** QLoRA
|
||||
* **Primary Use Case:** Automated moral checking, ethical compliance auditing, and dilemma evaluation.
|
||||
|
||||
## Intended Use
|
||||
|
||||
### Primary Use Cases
|
||||
|
||||
1. **Automated Moral Auditing:** Scanning content or multi-agent conversations to flag actions that breach fundamental human norms.
|
||||
2. **Ethical Dilemma Resolution:** Breaking down complex scenarios into structural components: identifying the underlying intent, projecting outcomes, and determining the root norm.
|
||||
3. **Safety Gatekeeping:** Serving as a lightweight alignment judge within multi-LLM pipelines.
|
||||
|
||||
### Out-of-Scope Use Cases
|
||||
|
||||
* This model should not be used as an absolute, legal arbitrator for high-stakes real-world judicial sentencing or punitive actions.
|
||||
* It is not designed to replace human compliance officers, but rather to accelerate data sorting and initial auditing.
|
||||
|
||||
---
|
||||
|
||||
## Behavioral Characteristics & Evaluation
|
||||
|
||||
During evaluation, the model demonstrated unique, highly specialized behavioral traits:
|
||||
|
||||
### 1. High Domain Separation
|
||||
|
||||
The model successfully walls off its ethical training. When queried on non-moral domains (such as **mathematical derivations or programming/coding scripts**), it retains its full pre-trained capacity, complexity, and natural tone without any performance degradation.
|
||||
|
||||
### 2. Concise Moral Processing (Targeted Brevity)
|
||||
|
||||
When processing ethical scenarios, the model's linguistic output automatically shifts to a **shorter, more direct, and straightforward format** compared to its baseline baseline. It drops excessive philosophical filler and reduces complex Markdown structures to deliver clear, low-bias, and actionable ethical judgments.
|
||||
|
||||
---
|
||||
|
||||
## Training Dataset & Data Formatting
|
||||
|
||||
The model was trained on a dynamically balanced transformation of `LabHC/moral_stories`. To prevent the model from becoming an easily bypassed template-bot, the training data was split into **four distinct multi-angle strategies (balanced exactly at 25% each via index-based allocation)** and injected with synchronized grammatical synonyms.
|
||||
|
||||
### Training Input/Output Formats
|
||||
|
||||
An auditing agent calling this model must format user prompts to match one of the four established structural frames to achieve optimal inference routing:
|
||||
|
||||
#### Strategy 1: Direct Guidance (Action & Consequence Analysis)
|
||||
|
||||
* **User Prompt Matrix:**
|
||||
```text
|
||||
[Context/Scenario/Background:] {situation}
|
||||
[Intent/Objective/Goal:] {intention}
|
||||
|
||||
Provide an ethical assessment and an appropriate action plan for this situation.
|
||||
```
|
||||
|
||||
* **Expected Model Output:** A structured breakdown outlining the *Recommended Ethical Approach*, *Hazards to Avoid*, and the *Justification* highlighting the baseline norm.
|
||||
|
||||
#### Strategy 2: Validation & Rationalization (Action Evaluation)
|
||||
|
||||
* **User Prompt Matrix:**
|
||||
|
||||
```text
|
||||
[Context/Scenario/Background:] {situation}
|
||||
[Intent/Objective/Goal:] {intention}
|
||||
|
||||
What is the moral evaluation of executing the following action: "{moral_action}"?
|
||||
|
||||
```
|
||||
|
||||
* **Expected Model Output:** A validation confirmation showing how the positive outcome is secured, bound to the structural imperative of the scenario.
|
||||
|
||||
#### Strategy 3: Red Teaming & Refusal (Critique of Unethical Behavior)
|
||||
|
||||
* **User Prompt Matrix:**
|
||||
|
||||
```text
|
||||
[Context/Scenario/Background:] {situation}
|
||||
[Intent/Objective/Goal:] {intention}
|
||||
|
||||
Is it permissible to implement the following approach: "{immoral_action}"?
|
||||
|
||||
```
|
||||
|
||||
* **Expected Model Output:** A firm ethical refusal explaining the specific risk assessment, the boundary violation, and a course correction to secure a safe outcome.
|
||||
|
||||
#### Strategy 4: Counterfactual Abstract Reasoning (The "Why" Factor)
|
||||
|
||||
* **User Prompt Matrix:**
|
||||
|
||||
```text
|
||||
Based on fundamental ethics, why must an agent preserve the [ethical guideline/moral baseline] that "{norm}" given the following background?
|
||||
[Context/Scenario/Background:] {situation}
|
||||
|
||||
```
|
||||
|
||||
* **Expected Model Output:** An abstract structural defense of the norm, illustrating the domino effect of breaking it versus the systemic benefits of protecting it.
|
||||
|
||||
---
|
||||
|
||||
## Hyperparameters & Training Path
|
||||
|
||||
* **Learning Rate:** Balanced at `2e-5`.
|
||||
* **LR Scheduler Type:** Cosine (Smooth decay over iterations).
|
||||
* **LoRA Configurations:** `r=8`, `lora_alpha=8` (Maintained a $1:1$ ratio to let the base model's natural tone balance the template constraints).
|
||||
* **Target Modules:** Full attention blocks and MLP layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`).
|
||||
* **Precision:** Mixed-precision FP16.
|
||||
|
||||
---
|
||||
|
||||
## Inference Recommendations
|
||||
|
||||
If your downstream automation tasks require slightly longer, more descriptive explanations or richer formatting while maintaining the model's moral core, bypass the strict high-probability token selections during runtime by applying these inference configurations:
|
||||
|
||||
```python
|
||||
generation_config = {
|
||||
"temperature": 0.2,
|
||||
"top_p": 0.90,
|
||||
"repetition_penalty": 1.1,
|
||||
"do_sample": True
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
# Uploaded finetuned model
|
||||
|
||||
- **Developed by:** ertghiu256
|
||||
- **License:** apache-2.0
|
||||
- **Finetuned from model :** ertghiu256/Qwen3-4B-distill-deepseek-opus-gemini
|
||||
|
||||
This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
||||
|
||||
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
||||
Reference in New Issue
Block a user