初始化项目，由ModelHub XC社区提供模型

Model: vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct Source: Original Platform
2026-05-05 09:15:07 +08:00
commit c189a5710c
15 changed files with 152546 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,230 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen3-8B
+tags:
+  - qwen3
+  - bias-detection
+  - news-debiasing
+  - text-generation
+  - fine-tuned
+  - unsloth
+  - sft
+datasets:
+  - vector-institute/Unbias-plus
+language:
+  - en
+---
+# Qwen3-8B-UnBias-Plus-SFT-Instruct
+
+A fine-tuned version of [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) for news media bias detection and neutral rewriting, developed by the [Vector Institute](https://vectorinstitute.ai) as part of the [UnBias-Plus](https://github.com/VectorInstitute/unbias-plus) project.
+
+Given a news article, the model identifies biased language segments, classifies their bias type and severity, provides neutral replacements, and returns a fully rewritten unbiased version of the article — all in a single structured JSON response.
+
+This is the **Instruct variant** — trained without chain-of-thought thinking blocks (`enable_thinking=False`). It produces clean structured JSON directly, making it faster and more reliable for production inference, including deployment via vLLM or other OpenAI-compatible serving backends.
+
+## Difference from [Qwen3-8B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT)
+
+| | SFT (thinking) | SFT-Instruct (this model) |
+|---|---|---|
+| Thinking mode | `enable_thinking=True` | `enable_thinking=False` |
+| Output | `<think>...</think>` + JSON | JSON directly |
+| Inference backend | Transformers | Transformers / vLLM |
+| Latency | Higher | Lower |
+| Recommended for | Research, local use | Production APIs, vLLM deployment |
+
+## Model Details
+
+| Property | Value |
+|---|---|
+| Base model | Qwen/Qwen3-8B |
+| Fine-tuning method | Supervised Fine-Tuning (SFT) with LoRA |
+| Training precision | bf16 (full precision, no quantization during training) |
+| LoRA rank | 16 |
+| Training framework | Unsloth + TRL |
+| Context length | 8192 tokens |
+| Thinking mode | Disabled (`enable_thinking=False`) |
+| Output format | Structured JSON |
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch, json
+
+model_id = "vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    dtype=torch.bfloat16,
+    device_map="auto",
+)
+model.eval()
+
+SYSTEM_PROMPT = """You are an expert linguist and bias detection specialist.
+Your task is to carefully read a news article, detect ALL biased language,
+and return a structured JSON response. Return ONLY valid JSON, no extra text."""
+
+article = "Your news article here..."
+
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": f"Analyze the following article for bias and return the result in the required JSON format.\n\nARTICLE:\n{article}"},
+]
+
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    enable_thinking=False,   # must be False for this variant
+    return_tensors="pt",
+    return_dict=True,
+    truncation=True,
+    max_length=8192,
+)
+
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=inputs["input_ids"].to(model.device),
+        attention_mask=inputs["attention_mask"].to(model.device),
+        max_new_tokens=4096,
+        do_sample=False,       # greedy decoding for deterministic JSON
+        temperature=None,
+        top_p=None,
+        pad_token_id=tokenizer.eos_token_id,
+    )
+
+new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
+response = tokenizer.decode(new_tokens, skip_special_tokens=True)
+result = json.loads(response)
+```
+
+### Using with the UnBias-Plus toolkit
+
+```python
+from unbias_plus import UnBiasPlus
+
+pipe = UnBiasPlus(
+    model_name_or_path="vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct",
+    load_in_4bit=False,   # set True for ~5GB VRAM
+)
+
+result = pipe.analyze("Your article text here...")
+print(result.binary_label)      # "biased" or "unbiased"
+print(result.severity)          # 0, 2, 3, or 4
+print(len(result.biased_segments))
+print(result.unbiased_text)
+```
+
+### Using with vLLM
+
+```bash
+vllm serve vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct \
+  --max-model-len 8192
+```
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
+
+completion = client.chat.completions.create(
+    model="vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct",
+    messages=messages,
+    max_tokens=4096,
+    temperature=0,
+    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
+)
+result = json.loads(completion.choices[0].message.content)
+```
+
+## Output Schema
+
+```json
+{
+  "binary_label": "biased" | "unbiased",
+  "severity": 0 | 2 | 3 | 4,
+  "bias_found": true | false,
+  "biased_segments": [
+    {
+      "original": "exact substring from input article",
+      "replacement": "neutral alternative phrase",
+      "severity": "high" | "medium" | "low",
+      "bias_type": "loaded language | dehumanizing framing | false generalizations | framing bias | euphemism/dysphemism | politically charged terminology | sensationalism",
+      "reasoning": "1-2 sentence explanation"
+    }
+  ],
+  "unbiased_text": "Full rewritten neutral article"
+}
+```
+
+### Severity Scale
+
+| Value | Meaning |
+|---|---|
+| 0 | Neutral — no bias detected |
+| 2 | Recurring biased framing |
+| 3 | Strong persuasive tone |
+| 4 | Inflammatory rhetoric |
+
+## Bias Types Detected
+
+- **Loaded language** — words with strong emotional connotations
+- **Dehumanizing framing** — language that strips dignity from groups
+- **False generalizations** — sweeping statements ("they always", "all of them")
+- **Framing bias** — selective wording that implies a viewpoint
+- **Euphemism/dysphemism** — softening or hardening language to manipulate perception
+- **Politically charged terminology** — labels used to provoke rather than describe
+- **Sensationalism** — exaggerated language to evoke emotional responses
+
+## Training Data
+
+Fine-tuned on [vector-institute/Unbias-plus](https://huggingface.co/datasets/vector-institute/Unbias-plus), a curated dataset of news articles with expert-annotated bias labels, segment-level annotations, and neutral rewrites.
+
+## Hardware Requirements
+
+| Setup | Configuration |
+|---|---|
+| Recommended (server) | `load_in_4bit=False, dtype=torch.bfloat16` (~16GB VRAM) |
+| Lightweight (laptop) | `load_in_4bit=True` (~5GB VRAM) |
+
+## Model Variants
+
+| | [4B SFT](https://huggingface.co/vector-institute/Qwen3-4B-UnBias-Plus-SFT) | [8B SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT) | 8B SFT-Instruct (this) |
+|---|---|---|---|
+| VRAM (bf16) | ~8GB | ~16GB | ~16GB |
+| VRAM (4-bit) | ~3GB | ~5GB | ~5GB |
+| Thinking mode | ✓ | ✓ | ✗ |
+| vLLM compatible | Partial | Partial | ✓ |
+| Quality | Strong | Higher | Higher |
+| Recommended for | Laptops | Research | Production APIs |
+
+## Limitations
+
+- Trained primarily on English-language news articles
+- Political bias detection reflects patterns in the training data
+- Best performance on articles under 5000 characters
+- As with all language models, outputs should be reviewed by a human before use in production
+
+## Citation
+
+If you use this model in your research or application, please cite:
+
+```bibtex
+@misc{unbias-plus-8b-instruct,
+  title        = {Qwen3-8B-UnBias-Plus-SFT-Instruct},
+  author       = {Vector Institute},
+  year         = {2026},
+  publisher    = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct}},
+  note         = {Part of the UnBias-Plus project: https://github.com/VectorInstitute/unbias-plus}
+}
+```
+
+## Links
+
+- 🔗 Project: [UnBias-Plus on GitHub](https://github.com/VectorInstitute/unbias-plus)
+- 📊 Dataset: [vector-institute/Unbias-plus](https://huggingface.co/datasets/vector-institute/Unbias-plus)
+- 🏛️ Organization: [Vector Institute](https://vectorinstitute.ai)
+- 🤖 4B version: [vector-institute/Qwen3-4B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-4B-UnBias-Plus-SFT)
+- 🤖 8B thinking version: [vector-institute/Qwen3-8B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT)