Files
SciJudge-4B/README.md
ModelHub XC af983e0b98 初始化项目,由ModelHub XC社区提供模型
Model: OpenMOSS-Team/SciJudge-4B
Source: Original Platform
2026-06-07 21:18:22 +08:00

61 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- en
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- scientific-evaluation
- citation-prediction
- preference-learning
- GRPO
pipeline_tag: text-generation
library_name: transformers
---
# SciJudge-Qwen3-4B
SciJudge-Qwen3-4B is a fine-tuned language model for **scientific paper evaluation**. Given two academic papers' metadata (title, abstract, publication date), it predicts which paper has a higher citation count — serving as a proxy for assessing research impact and "scientific taste."
This model is part of the paper: **[AI Can Learn Scientific Taste](https://arxiv.org/abs/2603.14473)**.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OpenMOSS-Team/SciJudge-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")
messages = [
{"role": "system", "content": "You are a helpful assistant. You first think about the reasoning process in your mind and then provide the user with the answer."},
{"role": "user", "content": "Today is 2025-12-10. Based on the titles, abstracts, and publication dates of the following two papers A and B, determine which paper has a higher citation count.\nShow your reasoning process in <reason> </reason> tags. And return the final answer in <answer> </answer> tags. The final answer should contain only 'A' or 'B'.\n\nPaper A:\nTitle: ...\nAbstract: ...\nDate: ...\n\nPaper B:\nTitle: ...\nAbstract: ...\nDate: ..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
```
## Training Details
- **Base model:** Qwen3-4B-Instruct-2507
- **Training method:** GRPO (Generative Reward Policy Optimization) with DAPO loss
- **Training data:** 720,341 preference pairs from arXiv papers
- **Learning rate:** 8e-7 (cosine schedule, 5% warmup)
- **Batch size:** 8 per device × 64 GPUs × 2 gradient accumulation = 1024 effective
- **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
- **Precision:** bfloat16
- **KL coefficient (β):** 0.03
## Citation
```bibtex
@article{scijudge2025,
title={AI Can Learn Scientific Taste},
year={2025}
}
```