BioGenesis-ToT/README.md

---
library_name: transformers
tags:
- sft
- reasoning
- unsloth
license: apache-2.0
language:
- en
metrics:
- accuracy
base_model:
- unsloth/Qwen3-1.7B
pipeline_tag: text-generation
---

# Model Card for BioGenesis-ToT


![alt="General Benchmark Comparison Chart"](benchmark/BioGenesis-ToT.png)

- **Overall Success Rate**:
  - khazarai/BioGenesis-ToT: **51.45**
  - Qwen/Qwen3-1.7B: **46.82**
 
- **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)


BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
 
The model demonstrates strong capabilities in:
- Structured biological explanation generation
- Logical and causal reasoning
- Chain-of-thought (ToT) reasoning in scientific contexts
- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)

## Uses

### 🚀 Intended Use

- Educational and scientific explanation generation
- Biological reasoning and tutoring applications
- Model interpretability research
- Training datasets for reasoning-focused LLMs


### ⚠️ Limitations

- Not a replacement for expert biological judgment
- May occasionally over-generalize or simplify complex phenomena
- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/BioGenesis-ToT")
model = AutoModelForCausalLM.from_pretrained(
    "khazarai/BioGenesis-ToT",
    device_map={"": 0}
)


question = """
Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
"""

messages = [
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
    enable_thinking = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 2200,
    temperature = 0.6,
    top_p = 0.95,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)
```


## 🧪 Dataset: moremilk/ToT-Biology

The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.

It spans a wide range of biological subdomains:
- Foundational biology: Cell biology, genetics, evolution, and ecology
- Advanced topics: Systems biology, synthetic biology, computational biophysics
- Applied domains: Medicine, agriculture, bioengineering, and environmental science

Dataset features include:

- 🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
- 🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
- 🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
- 🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
- 🎓 Educational focus — for both AI training and human learning in scientific reasoning


## 🧭 Objective

This fine-tuning project aims to build an interpretable reasoning model capable of:

- Explaining biological mechanisms clearly and coherently
- Demonstrating transparent, step-by-step thought processes
- Applying logical reasoning techniques to biological and interdisciplinary problems
- Supporting educational and research use cases where reasoning transparency matters


## Citation

**BibTeX:**
```bibtex
@model{khazarai/BioGenesis-ToT,
  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
  author    = {Rustam Shiriyev},
  year      = {2025},
  publisher = {Hugging Face},
  base_model = {Qwen3-1.7B},
  dataset   = {moremilk/ToT-Biology},
  license   = {MIT}
}
```
-												初始化项目，由ModelHub XC社区提供模型

Model: khazarai/BioGenesis-ToT
Source: Original Platform

											
										
										
											2026-04-10 17:58:59 +08:00
+								---
 								library_name: transformers
 								tags:
 								- sft
 								- reasoning
 								- unsloth
 								license: apache-2.0
 								language:
 								- en
 								metrics:
 								- accuracy
 								base_model:
 								- unsloth/Qwen3-1.7B
 								pipeline_tag: text-generation
 								---
 								# Model Card for BioGenesis-ToT
 								![alt="General Benchmark Comparison Chart"](benchmark/BioGenesis-ToT.png)
 								- **Overall Success Rate**:
 								  - khazarai/BioGenesis-ToT: **51.45**
 								  - Qwen/Qwen3-1.7B: **46.82**
 								- **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)
 								BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
 								This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
 								The model demonstrates strong capabilities in:
 								- Structured biological explanation generation
 								- Logical and causal reasoning
 								- Chain-of-thought (ToT) reasoning in scientific contexts
 								- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)
 								## Uses
 								### 🚀 Intended Use
 								- Educational and scientific explanation generation
 								- Biological reasoning and tutoring applications
 								- Model interpretability research
 								- Training datasets for reasoning-focused LLMs
 								### ⚠️ Limitations
 								- Not a replacement for expert biological judgment
 								- May occasionally over-generalize or simplify complex phenomena
 								- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)
 								## How to Get Started with the Model
 								Use the code below to get started with the model.
 								```python
 								from transformers import AutoTokenizer, AutoModelForCausalLM
 								tokenizer = AutoTokenizer.from_pretrained("khazarai/BioGenesis-ToT")
 								model = AutoModelForCausalLM.from_pretrained(
 								    "khazarai/BioGenesis-ToT",
 								    device_map={"": 0}
 								)
 								question = """
 								Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
 								"""
 								messages = [
 								    {"role" : "user", "content" : question}
 								]
 								text = tokenizer.apply_chat_template(
 								    messages,
 								    tokenize = False,
 								    add_generation_prompt = True,
 								    enable_thinking = True,
 								)
 								from transformers import TextStreamer
 								_ = model.generate(
 								    **tokenizer(text, return_tensors = "pt").to("cuda"),
 								    max_new_tokens = 2200,
 								    temperature = 0.6,
 								    top_p = 0.95,
 								    top_k = 20,
 								    streamer = TextStreamer(tokenizer, skip_prompt = True),
 								)
 								```
 								## 🧪 Dataset: moremilk/ToT-Biology
 								The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
 								It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.
 								It spans a wide range of biological subdomains:
 								- Foundational biology: Cell biology, genetics, evolution, and ecology
 								- Advanced topics: Systems biology, synthetic biology, computational biophysics
 								- Applied domains: Medicine, agriculture, bioengineering, and environmental science
 								Dataset features include:
 								- 🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
 								- 🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
 								- 🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
 								- 🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
 								- 🎓 Educational focus — for both AI training and human learning in scientific reasoning
 								## 🧭 Objective
 								This fine-tuning project aims to build an interpretable reasoning model capable of:
 								- Explaining biological mechanisms clearly and coherently
 								- Demonstrating transparent, step-by-step thought processes
 								- Applying logical reasoning techniques to biological and interdisciplinary problems
 								- Supporting educational and research use cases where reasoning transparency matters
 								## Citation
 								**BibTeX:**
 								```bibtex
 								@model{khazarai/BioGenesis-ToT,
 								  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
 								  author    = {Rustam Shiriyev},
 								  year      = {2025},
 								  publisher = {Hugging Face},
 								  base_model = {Qwen3-1.7B},
 								  dataset   = {moremilk/ToT-Biology},
 								  license   = {MIT}
 								}
 								```