初始化项目,由ModelHub XC社区提供模型
Model: vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct Source: Original Platform
This commit is contained in:
230
README.md
Normal file
230
README.md
Normal file
@@ -0,0 +1,230 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
tags:
|
||||
- qwen3
|
||||
- bias-detection
|
||||
- news-debiasing
|
||||
- text-generation
|
||||
- fine-tuned
|
||||
- unsloth
|
||||
- sft
|
||||
datasets:
|
||||
- vector-institute/Unbias-plus
|
||||
language:
|
||||
- en
|
||||
---
|
||||
# Qwen3-8B-UnBias-Plus-SFT-Instruct
|
||||
|
||||
A fine-tuned version of [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) for news media bias detection and neutral rewriting, developed by the [Vector Institute](https://vectorinstitute.ai) as part of the [UnBias-Plus](https://github.com/VectorInstitute/unbias-plus) project.
|
||||
|
||||
Given a news article, the model identifies biased language segments, classifies their bias type and severity, provides neutral replacements, and returns a fully rewritten unbiased version of the article — all in a single structured JSON response.
|
||||
|
||||
This is the **Instruct variant** — trained without chain-of-thought thinking blocks (`enable_thinking=False`). It produces clean structured JSON directly, making it faster and more reliable for production inference, including deployment via vLLM or other OpenAI-compatible serving backends.
|
||||
|
||||
## Difference from [Qwen3-8B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT)
|
||||
|
||||
| | SFT (thinking) | SFT-Instruct (this model) |
|
||||
|---|---|---|
|
||||
| Thinking mode | `enable_thinking=True` | `enable_thinking=False` |
|
||||
| Output | `<think>...</think>` + JSON | JSON directly |
|
||||
| Inference backend | Transformers | Transformers / vLLM |
|
||||
| Latency | Higher | Lower |
|
||||
| Recommended for | Research, local use | Production APIs, vLLM deployment |
|
||||
|
||||
## Model Details
|
||||
|
||||
| Property | Value |
|
||||
|---|---|
|
||||
| Base model | Qwen/Qwen3-8B |
|
||||
| Fine-tuning method | Supervised Fine-Tuning (SFT) with LoRA |
|
||||
| Training precision | bf16 (full precision, no quantization during training) |
|
||||
| LoRA rank | 16 |
|
||||
| Training framework | Unsloth + TRL |
|
||||
| Context length | 8192 tokens |
|
||||
| Thinking mode | Disabled (`enable_thinking=False`) |
|
||||
| Output format | Structured JSON |
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch, json
|
||||
|
||||
model_id = "vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
model.eval()
|
||||
|
||||
SYSTEM_PROMPT = """You are an expert linguist and bias detection specialist.
|
||||
Your task is to carefully read a news article, detect ALL biased language,
|
||||
and return a structured JSON response. Return ONLY valid JSON, no extra text."""
|
||||
|
||||
article = "Your news article here..."
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": f"Analyze the following article for bias and return the result in the required JSON format.\n\nARTICLE:\n{article}"},
|
||||
]
|
||||
|
||||
inputs = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=True,
|
||||
add_generation_prompt=True,
|
||||
enable_thinking=False, # must be False for this variant
|
||||
return_tensors="pt",
|
||||
return_dict=True,
|
||||
truncation=True,
|
||||
max_length=8192,
|
||||
)
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model.generate(
|
||||
input_ids=inputs["input_ids"].to(model.device),
|
||||
attention_mask=inputs["attention_mask"].to(model.device),
|
||||
max_new_tokens=4096,
|
||||
do_sample=False, # greedy decoding for deterministic JSON
|
||||
temperature=None,
|
||||
top_p=None,
|
||||
pad_token_id=tokenizer.eos_token_id,
|
||||
)
|
||||
|
||||
new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
|
||||
response = tokenizer.decode(new_tokens, skip_special_tokens=True)
|
||||
result = json.loads(response)
|
||||
```
|
||||
|
||||
### Using with the UnBias-Plus toolkit
|
||||
|
||||
```python
|
||||
from unbias_plus import UnBiasPlus
|
||||
|
||||
pipe = UnBiasPlus(
|
||||
model_name_or_path="vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct",
|
||||
load_in_4bit=False, # set True for ~5GB VRAM
|
||||
)
|
||||
|
||||
result = pipe.analyze("Your article text here...")
|
||||
print(result.binary_label) # "biased" or "unbiased"
|
||||
print(result.severity) # 0, 2, 3, or 4
|
||||
print(len(result.biased_segments))
|
||||
print(result.unbiased_text)
|
||||
```
|
||||
|
||||
### Using with vLLM
|
||||
|
||||
```bash
|
||||
vllm serve vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct \
|
||||
--max-model-len 8192
|
||||
```
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
|
||||
|
||||
completion = client.chat.completions.create(
|
||||
model="vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct",
|
||||
messages=messages,
|
||||
max_tokens=4096,
|
||||
temperature=0,
|
||||
extra_body={"chat_template_kwargs": {"enable_thinking": False}},
|
||||
)
|
||||
result = json.loads(completion.choices[0].message.content)
|
||||
```
|
||||
|
||||
## Output Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"binary_label": "biased" | "unbiased",
|
||||
"severity": 0 | 2 | 3 | 4,
|
||||
"bias_found": true | false,
|
||||
"biased_segments": [
|
||||
{
|
||||
"original": "exact substring from input article",
|
||||
"replacement": "neutral alternative phrase",
|
||||
"severity": "high" | "medium" | "low",
|
||||
"bias_type": "loaded language | dehumanizing framing | false generalizations | framing bias | euphemism/dysphemism | politically charged terminology | sensationalism",
|
||||
"reasoning": "1-2 sentence explanation"
|
||||
}
|
||||
],
|
||||
"unbiased_text": "Full rewritten neutral article"
|
||||
}
|
||||
```
|
||||
|
||||
### Severity Scale
|
||||
|
||||
| Value | Meaning |
|
||||
|---|---|
|
||||
| 0 | Neutral — no bias detected |
|
||||
| 2 | Recurring biased framing |
|
||||
| 3 | Strong persuasive tone |
|
||||
| 4 | Inflammatory rhetoric |
|
||||
|
||||
## Bias Types Detected
|
||||
|
||||
- **Loaded language** — words with strong emotional connotations
|
||||
- **Dehumanizing framing** — language that strips dignity from groups
|
||||
- **False generalizations** — sweeping statements ("they always", "all of them")
|
||||
- **Framing bias** — selective wording that implies a viewpoint
|
||||
- **Euphemism/dysphemism** — softening or hardening language to manipulate perception
|
||||
- **Politically charged terminology** — labels used to provoke rather than describe
|
||||
- **Sensationalism** — exaggerated language to evoke emotional responses
|
||||
|
||||
## Training Data
|
||||
|
||||
Fine-tuned on [vector-institute/Unbias-plus](https://huggingface.co/datasets/vector-institute/Unbias-plus), a curated dataset of news articles with expert-annotated bias labels, segment-level annotations, and neutral rewrites.
|
||||
|
||||
## Hardware Requirements
|
||||
|
||||
| Setup | Configuration |
|
||||
|---|---|
|
||||
| Recommended (server) | `load_in_4bit=False, dtype=torch.bfloat16` (~16GB VRAM) |
|
||||
| Lightweight (laptop) | `load_in_4bit=True` (~5GB VRAM) |
|
||||
|
||||
## Model Variants
|
||||
|
||||
| | [4B SFT](https://huggingface.co/vector-institute/Qwen3-4B-UnBias-Plus-SFT) | [8B SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT) | 8B SFT-Instruct (this) |
|
||||
|---|---|---|---|
|
||||
| VRAM (bf16) | ~8GB | ~16GB | ~16GB |
|
||||
| VRAM (4-bit) | ~3GB | ~5GB | ~5GB |
|
||||
| Thinking mode | ✓ | ✓ | ✗ |
|
||||
| vLLM compatible | Partial | Partial | ✓ |
|
||||
| Quality | Strong | Higher | Higher |
|
||||
| Recommended for | Laptops | Research | Production APIs |
|
||||
|
||||
## Limitations
|
||||
|
||||
- Trained primarily on English-language news articles
|
||||
- Political bias detection reflects patterns in the training data
|
||||
- Best performance on articles under 5000 characters
|
||||
- As with all language models, outputs should be reviewed by a human before use in production
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model in your research or application, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc{unbias-plus-8b-instruct,
|
||||
title = {Qwen3-8B-UnBias-Plus-SFT-Instruct},
|
||||
author = {Vector Institute},
|
||||
year = {2026},
|
||||
publisher = {HuggingFace},
|
||||
howpublished = {\url{https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT-Instruct}},
|
||||
note = {Part of the UnBias-Plus project: https://github.com/VectorInstitute/unbias-plus}
|
||||
}
|
||||
```
|
||||
|
||||
## Links
|
||||
|
||||
- 🔗 Project: [UnBias-Plus on GitHub](https://github.com/VectorInstitute/unbias-plus)
|
||||
- 📊 Dataset: [vector-institute/Unbias-plus](https://huggingface.co/datasets/vector-institute/Unbias-plus)
|
||||
- 🏛️ Organization: [Vector Institute](https://vectorinstitute.ai)
|
||||
- 🤖 4B version: [vector-institute/Qwen3-4B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-4B-UnBias-Plus-SFT)
|
||||
- 🤖 8B thinking version: [vector-institute/Qwen3-8B-UnBias-Plus-SFT](https://huggingface.co/vector-institute/Qwen3-8B-UnBias-Plus-SFT)
|
||||
Reference in New Issue
Block a user