初始化项目,由ModelHub XC社区提供模型
Model: khazarai/BioGenesis-ToT Source: Original Platform
This commit is contained in:
137
README.md
Normal file
137
README.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
library_name: transformers
|
||||
tags:
|
||||
- sft
|
||||
- reasoning
|
||||
- unsloth
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
metrics:
|
||||
- accuracy
|
||||
base_model:
|
||||
- unsloth/Qwen3-1.7B
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Model Card for BioGenesis-ToT
|
||||
|
||||
|
||||

|
||||
|
||||
- **Overall Success Rate**:
|
||||
- khazarai/BioGenesis-ToT: **51.45**
|
||||
- Qwen/Qwen3-1.7B: **46.82**
|
||||
|
||||
- **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)
|
||||
|
||||
|
||||
BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
|
||||
This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
|
||||
|
||||
The model demonstrates strong capabilities in:
|
||||
- Structured biological explanation generation
|
||||
- Logical and causal reasoning
|
||||
- Chain-of-thought (ToT) reasoning in scientific contexts
|
||||
- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)
|
||||
|
||||
## Uses
|
||||
|
||||
### 🚀 Intended Use
|
||||
|
||||
- Educational and scientific explanation generation
|
||||
- Biological reasoning and tutoring applications
|
||||
- Model interpretability research
|
||||
- Training datasets for reasoning-focused LLMs
|
||||
|
||||
|
||||
### ⚠️ Limitations
|
||||
|
||||
- Not a replacement for expert biological judgment
|
||||
- May occasionally over-generalize or simplify complex phenomena
|
||||
- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)
|
||||
|
||||
|
||||
## How to Get Started with the Model
|
||||
|
||||
Use the code below to get started with the model.
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("khazarai/BioGenesis-ToT")
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"khazarai/BioGenesis-ToT",
|
||||
device_map={"": 0}
|
||||
)
|
||||
|
||||
|
||||
question = """
|
||||
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
|
||||
"""
|
||||
|
||||
messages = [
|
||||
{"role" : "user", "content" : question}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize = False,
|
||||
add_generation_prompt = True,
|
||||
enable_thinking = True,
|
||||
)
|
||||
|
||||
from transformers import TextStreamer
|
||||
_ = model.generate(
|
||||
**tokenizer(text, return_tensors = "pt").to("cuda"),
|
||||
max_new_tokens = 2200,
|
||||
temperature = 0.6,
|
||||
top_p = 0.95,
|
||||
top_k = 20,
|
||||
streamer = TextStreamer(tokenizer, skip_prompt = True),
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
## 🧪 Dataset: moremilk/ToT-Biology
|
||||
|
||||
The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
|
||||
It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.
|
||||
|
||||
It spans a wide range of biological subdomains:
|
||||
- Foundational biology: Cell biology, genetics, evolution, and ecology
|
||||
- Advanced topics: Systems biology, synthetic biology, computational biophysics
|
||||
- Applied domains: Medicine, agriculture, bioengineering, and environmental science
|
||||
|
||||
Dataset features include:
|
||||
|
||||
- 🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
|
||||
- 🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
|
||||
- 🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
|
||||
- 🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
|
||||
- 🎓 Educational focus — for both AI training and human learning in scientific reasoning
|
||||
|
||||
|
||||
## 🧭 Objective
|
||||
|
||||
This fine-tuning project aims to build an interpretable reasoning model capable of:
|
||||
|
||||
- Explaining biological mechanisms clearly and coherently
|
||||
- Demonstrating transparent, step-by-step thought processes
|
||||
- Applying logical reasoning techniques to biological and interdisciplinary problems
|
||||
- Supporting educational and research use cases where reasoning transparency matters
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
**BibTeX:**
|
||||
```bibtex
|
||||
@model{khazarai/BioGenesis-ToT,
|
||||
title = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
|
||||
author = {Rustam Shiriyev},
|
||||
year = {2025},
|
||||
publisher = {Hugging Face},
|
||||
base_model = {Qwen3-1.7B},
|
||||
dataset = {moremilk/ToT-Biology},
|
||||
license = {MIT}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user