初始化项目，由ModelHub XC社区提供模型

Model: khazarai/BioGenesis-ToT Source: Original Platform
2026-04-10 17:58:59 +08:00
commit 66e40cf6e0
13 changed files with 152043 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,137 @@
+---
+library_name: transformers
+tags:
+- sft
+- reasoning
+- unsloth
+license: apache-2.0
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- unsloth/Qwen3-1.7B
+pipeline_tag: text-generation
+---
+
+# Model Card for BioGenesis-ToT
+
+
+![alt="General Benchmark Comparison Chart"](benchmark/BioGenesis-ToT.png)
+
+- **Overall Success Rate**:
+  - khazarai/BioGenesis-ToT: **51.45**
+  - Qwen/Qwen3-1.7B: **46.82**
+ 
+- **Benchmark**: [emre/TARA_Turkish_LLM_Benchmark](https://huggingface.co/datasets/emre/TARA_Turkish_LLM_Benchmark)
+
+
+BioGenesis-ToT is a fine-tuned version of Qwen3-1.7B, optimized for mechanistic reasoning and explanatory understanding in biology.
+This model has been trained on the [moremilk/ToT-Biology](https://huggingface.co/datasets/moremilk/ToT-Biology) dataset — a reasoning-rich collection of biology questions emphasizing why and how processes occur, rather than simply what happens.
+ 
+The model demonstrates strong capabilities in:
+- Structured biological explanation generation
+- Logical and causal reasoning
+- Chain-of-thought (ToT) reasoning in scientific contexts
+- Interdisciplinary biological analysis (e.g., bioengineering, medicine, ecology)
+
+## Uses
+
+### 🚀 Intended Use
+
+- Educational and scientific explanation generation
+- Biological reasoning and tutoring applications
+- Model interpretability research
+- Training datasets for reasoning-focused LLMs
+
+
+### ⚠️ Limitations
+
+- Not a replacement for expert biological judgment
+- May occasionally over-generalize or simplify complex phenomena
+- Limited to reasoning quality within biological contexts (not trained for creative writing or coding)
+
+
+## How to Get Started with the Model
+
+Use the code below to get started with the model.
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+tokenizer = AutoTokenizer.from_pretrained("khazarai/BioGenesis-ToT")
+model = AutoModelForCausalLM.from_pretrained(
+    "khazarai/BioGenesis-ToT",
+    device_map={"": 0}
+)
+
+
+question = """
+Describe the composition of the plasma membrane and explain how its structure relates to its function of selective permeability.
+"""
+
+messages = [
+    {"role" : "user", "content" : question}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize = False,
+    add_generation_prompt = True,
+    enable_thinking = True,
+)
+
+from transformers import TextStreamer
+_ = model.generate(
+    **tokenizer(text, return_tensors = "pt").to("cuda"),
+    max_new_tokens = 2200,
+    temperature = 0.6,
+    top_p = 0.95,
+    top_k = 20,
+    streamer = TextStreamer(tokenizer, skip_prompt = True),
+)
+```
+
+
+## 🧪 Dataset: moremilk/ToT-Biology
+
+The ToT-Biology dataset emphasizes mechanistic understanding and explanatory reasoning within biology.
+It’s designed to help AI models develop interpretable, step-by-step reasoning abilities for complex biological systems.
+
+It spans a wide range of biological subdomains:
+- Foundational biology: Cell biology, genetics, evolution, and ecology
+- Advanced topics: Systems biology, synthetic biology, computational biophysics
+- Applied domains: Medicine, agriculture, bioengineering, and environmental science
+
+Dataset features include:
+
+- 🧩 Logical reasoning styles — deductive, inductive, abductive, causal, and analogical
+- 🧠 Problem-solving techniques — decomposition, elimination, systems thinking, trade-off analysis
+- 🔬 Real-world problem contexts — experiment design, pathway mapping, and data interpretation
+- 🌍 Practical relevance — bridging theoretical reasoning and applied biological insight
+- 🎓 Educational focus — for both AI training and human learning in scientific reasoning
+
+
+## 🧭 Objective
+
+This fine-tuning project aims to build an interpretable reasoning model capable of:
+
+- Explaining biological mechanisms clearly and coherently
+- Demonstrating transparent, step-by-step thought processes
+- Applying logical reasoning techniques to biological and interdisciplinary problems
+- Supporting educational and research use cases where reasoning transparency matters
+
+
+## Citation
+
+**BibTeX:**
+```bibtex
+@model{khazarai/BioGenesis-ToT,
+  title     = {BioGenesis-ToT: A Fine-Tuned Model for Explanatory Biological Reasoning},
+  author    = {Rustam Shiriyev},
+  year      = {2025},
+  publisher = {Hugging Face},
+  base_model = {Qwen3-1.7B},
+  dataset   = {moremilk/ToT-Biology},
+  license   = {MIT}
+}
+```