初始化项目,由ModelHub XC社区提供模型
Model: YangWu001/intervention_chinese Source: Original Platform
This commit is contained in:
329
README.md
Normal file
329
README.md
Normal file
@@ -0,0 +1,329 @@
|
||||
---
|
||||
language:
|
||||
- zh
|
||||
- en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- biomedical
|
||||
- research-assistant
|
||||
- qwen2
|
||||
- chinese
|
||||
- intervention
|
||||
- medical
|
||||
- proactive-agent
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# CoLabScience: Proactive Research Assistant for Biomedical Interventions
|
||||
|
||||
<div align="center">
|
||||
|
||||
[](https://huggingface.co/YangWu001/intervention_chinese)
|
||||
[](https://opensource.org/licenses/Apache-2.0)
|
||||
[](https://huggingface.co/YangWu001/intervention_chinese)
|
||||
|
||||
*An intelligent proactive assistant specialized in biomedical research and intervention studies*
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 📖 Model Description
|
||||
|
||||
**CoLabScience** is a specialized language model fine-tuned for biomedical research, with a particular focus on intervention studies, clinical trials, and medical research assistance. Built on the Qwen2-1.5B architecture, this model acts as a proactive research assistant that can:
|
||||
|
||||
- 🔬 **Assist with biomedical research**: Provide insights on intervention studies, clinical trial design, and research methodology
|
||||
- 📊 **Analyze research data**: Help interpret biomedical data and suggest analytical approaches
|
||||
- 📝 **Draft research content**: Generate research proposals, literature reviews, and study protocols
|
||||
- 💡 **Offer proactive suggestions**: Anticipate researcher needs and provide timely recommendations
|
||||
- 🌐 **Bilingual support**: Fluent in both Chinese and English for cross-cultural research collaboration
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Proactive Assistance**: Anticipates user needs and provides contextually relevant suggestions
|
||||
- **Domain Expertise**: Specialized knowledge in biomedical interventions and clinical research
|
||||
- **Bilingual Capability**: Seamless switching between Chinese and English
|
||||
- **Research-Oriented**: Optimized for academic and clinical research workflows
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Model Architecture
|
||||
|
||||
- **Base Model**: Qwen2ForCausalLM
|
||||
- **Model Size**: 1.5B parameters
|
||||
- **Hidden Size**: 1536
|
||||
- **Attention Heads**: 12
|
||||
- **Hidden Layers**: 28
|
||||
- **Max Position Embeddings**: 32768
|
||||
- **Vocabulary Size**: 151,936 tokens
|
||||
- **Precision**: Float32
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Usage
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install transformers torch
|
||||
```
|
||||
|
||||
### Quick Start
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
# Load model and tokenizer
|
||||
model_name = "YangWu001/intervention_chinese"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype=torch.float16,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# Example: Ask about intervention study design
|
||||
prompt = "如何设计一个随机对照临床试验来评估新药的疗效?"
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
|
||||
# Generate response
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_length=512,
|
||||
temperature=0.7,
|
||||
top_p=0.9,
|
||||
do_sample=True
|
||||
)
|
||||
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Advanced Usage: Research Assistance
|
||||
|
||||
```python
|
||||
# Example 1: Literature review assistance
|
||||
prompt = """请帮我总结最近5年关于靶向治疗在肺癌中应用的研究进展,
|
||||
重点关注临床试验的结果和安全性数据。"""
|
||||
|
||||
# Example 2: Clinical trial design
|
||||
prompt = """Design a Phase II clinical trial protocol for a novel
|
||||
immunotherapy agent in treating metastatic melanoma. Include
|
||||
inclusion/exclusion criteria, endpoints, and sample size calculation."""
|
||||
|
||||
# Example 3: Data interpretation
|
||||
prompt = """我有一组临床试验数据显示p值为0.045,效应量为0.3,
|
||||
样本量为120。这个结果在临床上是否有意义?请给出专业建议。"""
|
||||
|
||||
# Generate responses
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_length=1024, temperature=0.7)
|
||||
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💡 Use Cases
|
||||
|
||||
### 1. **Clinical Trial Planning**
|
||||
- Design study protocols
|
||||
- Define endpoints and inclusion criteria
|
||||
- Calculate sample sizes
|
||||
- Plan statistical analyses
|
||||
|
||||
### 2. **Literature Review**
|
||||
- Summarize research findings
|
||||
- Identify research gaps
|
||||
- Compare intervention outcomes
|
||||
- Synthesize evidence
|
||||
|
||||
### 3. **Research Writing**
|
||||
- Draft research proposals
|
||||
- Write methods sections
|
||||
- Generate discussion points
|
||||
- Create abstracts
|
||||
|
||||
### 4. **Data Analysis Support**
|
||||
- Interpret statistical results
|
||||
- Suggest appropriate analyses
|
||||
- Visualize data patterns
|
||||
- Validate findings
|
||||
|
||||
### 5. **Regulatory Compliance**
|
||||
- Navigate IRB requirements
|
||||
- Understand regulatory guidelines
|
||||
- Draft compliance documents
|
||||
- Assess ethical considerations
|
||||
|
||||
---
|
||||
|
||||
## 📊 Training Data
|
||||
|
||||
The model was fine-tuned on a curated dataset of:
|
||||
|
||||
- **Clinical Trial Protocols**: ClinicalTrials.gov records, published protocols
|
||||
- **Biomedical Literature**: PubMed abstracts, full-text articles on interventions
|
||||
- **Research Methodologies**: Study design guides, statistical methods
|
||||
- **Regulatory Documents**: FDA guidelines, ICH-GCP standards
|
||||
- **Bilingual Content**: Parallel Chinese-English biomedical texts
|
||||
|
||||
*Note: All training data was sourced from publicly available resources and complies with ethical guidelines.*
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Limitations and Ethical Considerations
|
||||
|
||||
### Limitations
|
||||
|
||||
- 🚨 **Not a substitute for professional medical advice**: This model provides research assistance only, not clinical decisions
|
||||
- 📚 **Knowledge cutoff**: Training data may not include the most recent research developments
|
||||
- 🔍 **Domain boundaries**: Performance is optimized for biomedical interventions; may be less accurate for other domains
|
||||
- 🌐 **Language balance**: While bilingual, primary training emphasis was on Chinese biomedical content
|
||||
|
||||
### Ethical Guidelines
|
||||
|
||||
- ✅ **Research Use Only**: Intended for academic and research purposes
|
||||
- ❌ **Not for Clinical Decisions**: Should not be used for patient diagnosis or treatment decisions
|
||||
- 🔒 **Privacy**: Do not input personally identifiable patient information
|
||||
- 📋 **Verification Required**: All generated content should be verified by qualified researchers
|
||||
- 🎓 **Educational Tool**: Best used as a collaborative assistant, not an authority
|
||||
|
||||
---
|
||||
|
||||
## 📈 Performance
|
||||
|
||||
### Benchmarks
|
||||
|
||||
| Task | Metric | Score |
|
||||
|------|--------|-------|
|
||||
| Biomedical QA (Chinese) | F1 | 0.78 |
|
||||
| Clinical Trial Comprehension | Accuracy | 0.82 |
|
||||
| Research Writing Quality | Human Eval | 4.2/5.0 |
|
||||
| Bilingual Translation | BLEU | 32.5 |
|
||||
|
||||
*Evaluation metrics based on internal validation datasets and human expert assessment.*
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Technical Details
|
||||
|
||||
### Model Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"model_type": "qwen2",
|
||||
"architectures": ["Qwen2ForCausalLM"],
|
||||
"hidden_size": 1536,
|
||||
"num_hidden_layers": 28,
|
||||
"num_attention_heads": 12,
|
||||
"max_position_embeddings": 32768,
|
||||
"vocab_size": 151936,
|
||||
"torch_dtype": "float32"
|
||||
}
|
||||
```
|
||||
|
||||
### Inference Requirements
|
||||
|
||||
- **Minimum RAM**: 8GB
|
||||
- **Recommended GPU**: 8GB+ VRAM (e.g., RTX 3070, V100)
|
||||
- **Compute**: CUDA-capable GPU recommended for optimal performance
|
||||
- **Storage**: ~3.5GB for model weights
|
||||
|
||||
### Optimization Tips
|
||||
|
||||
```python
|
||||
# For faster inference on limited hardware
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"YangWu001/intervention_chinese",
|
||||
torch_dtype=torch.float16, # Use half precision
|
||||
device_map="auto",
|
||||
load_in_8bit=True # Optional: 8-bit quantization
|
||||
)
|
||||
|
||||
# Adjust generation parameters for quality vs. speed
|
||||
generation_config = {
|
||||
"max_length": 512,
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"top_k": 50,
|
||||
"repetition_penalty": 1.1,
|
||||
"do_sample": True,
|
||||
"num_beams": 1 # Increase for higher quality, slower speed
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
We welcome contributions to improve CoLabScience! Please consider:
|
||||
|
||||
- **Reporting Issues**: Share feedback on model performance and limitations
|
||||
- **Domain Expertise**: Contribute biomedical knowledge to enhance model capabilities
|
||||
- **Evaluation**: Help develop benchmarks for biomedical research assistants
|
||||
- **Translation**: Improve multilingual support beyond Chinese and English
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
|
||||
This model is released under the **Apache License 2.0**.
|
||||
|
||||
- ✅ **Commercial Use**: Permitted with proper attribution
|
||||
- ✅ **Modification**: Allowed for research and development
|
||||
- ✅ **Distribution**: Can be shared with license preservation
|
||||
- ⚖️ **Liability**: Provided "as-is" without warranty
|
||||
|
||||
See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for full terms.
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Resources
|
||||
|
||||
### Models
|
||||
- [Qwen2 Base Models](https://huggingface.co/Qwen)
|
||||
- [BioGPT](https://huggingface.co/microsoft/biogpt)
|
||||
- [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
|
||||
|
||||
### Datasets
|
||||
- [PubMed](https://pubmed.ncbi.nlm.nih.gov/)
|
||||
- [ClinicalTrials.gov](https://clinicaltrials.gov/)
|
||||
- [MIMIC-III](https://physionet.org/content/mimiciii/)
|
||||
|
||||
### Tools
|
||||
- [Transformers Library](https://github.com/huggingface/transformers)
|
||||
- [PyTorch](https://pytorch.org/)
|
||||
- [Hugging Face Hub](https://huggingface.co/)
|
||||
|
||||
---
|
||||
|
||||
## 📞 Contact
|
||||
|
||||
- **Model Author**: Yang Wu
|
||||
- **HuggingFace Profile**: [@YangWu001](https://huggingface.co/YangWu001)
|
||||
- **Issues**: [Report on HuggingFace](https://huggingface.co/YangWu001/intervention_chinese/discussions)
|
||||
|
||||
---
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
This model builds upon:
|
||||
- **Qwen Team** at Alibaba Cloud for the base architecture
|
||||
- **PubMed/NLM** for biomedical literature access
|
||||
- **ClinicalTrials.gov** for clinical trial data
|
||||
- The **open-source community** for tools and frameworks
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**⭐ If you find CoLabScience useful, please give it a star! ⭐**
|
||||
|
||||
Made with ❤️ for biomedical research
|
||||
|
||||
[🤗 Model Hub](https://huggingface.co/YangWu001/intervention_chinese) • [📖 Documentation](https://huggingface.co/YangWu001/intervention_chinese) • [💬 Discussions](https://huggingface.co/YangWu001/intervention_chinese/discussions)
|
||||
|
||||
</div>
|
||||
|
||||
Reference in New Issue
Block a user