Files
ModelHub XC 64846bfbb3 初始化项目,由ModelHub XC社区提供模型
Model: prithivMLmods/Capricornus-MoT-1.7B-Supreme1
Source: Original Platform
2026-05-19 11:50:45 +08:00

120 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
datasets:
- open-r1/Mixture-of-Thoughts
- nvidia/OpenCodeReasoning
language:
- en
base_model:
- prithivMLmods/Qwen3-1.7B-ft-bf16
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation-inference
- math
- science
- moe
- code
---
![77.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/gwzyOkgaFO6AbSqT1_vMf.png)
# **Capricornus-MoT-1.7B-Supreme1**
> **Capricornus-MoT-1.7B-Supreme1** is a **high-precision, multi-domain expert model** fine-tuned from **Qwen3-1.7B**, built for **code generation**, **mathematical reasoning**, **scientific analysis**, and **open technical inference**. Trained on the **Mixture of Thoughts (MoT)** dataset with combined expert clusters in **code, math, and science**, and enhanced with an **Open Code Reasoning** dataset, it delivers powerful symbolic and structured outputs in a wide range of STEM and reasoning domains.
> \[!note]
> GGUF: [https://huggingface.co/prithivMLmods/Capricornus-MoT-1.7B-Supreme1-GGUF](https://huggingface.co/prithivMLmods/Capricornus-MoT-1.7B-Supreme1-GGUF)
---
## **Key Features**
1. **Multi-Expert MoT Fine-Tuning**
Fine-tuned on the **Mixture of Thoughts** dataset combining **code**, **math**, and **science** expert clusters, with added **Open Code Reasoning** for step-wise technical problem-solving and advanced symbolic thinking.
2. **Unified STEM Intelligence**
Excels in algebra, calculus, scientific reasoning, and code logic—ideal for complex multi-step tasks, simulations, and educational applications.
3. **Advanced Code & Math Generation**
Produces robust, readable code (Python, JavaScript, C++) with inline reasoning and debugging. Simultaneously capable of solving symbolic math and scientific problems with clarity.
4. **Structured Output Proficiency**
Generates content in **Markdown**, **LaTeX**, **JSON**, and **YAML**—tailored for auto-documentation, data structuring, academic formats, and more.
5. **Multilingual & Multimodal Support**
Handles technical prompts across **20+ languages** and adapts well to mixed-language code and STEM contexts for a global audience.
6. **Efficient 1.7B Inference Engine**
Optimized for performance-to-power ratio—runs smoothly on consumer GPUs (e.g., 816GB VRAM), with elite-level results in symbolic tasks.
---
## **Quickstart with Transformers**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Capricornus-MoT-1.7B-Supreme1"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the code and solve the equation: Write a Python function to solve 2x + 3 = 11, and explain each step."
messages = [
{"role": "system", "content": "You are an expert in math, code, and science reasoning."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
---
## **Intended Use**
* Symbolic problem-solving in mathematics and science
* Intelligent code generation, analysis, and debugging
* Academic research assistants and structured STEM tutors
* Multilingual, structured output generation for documentation
* Ideal for developers, educators, and edge deployment in technical domains
---
## **Limitations**
* May not match performance of larger models on long-form generative or creative tasks
* Context window constraints affect large dataset or document processing
* Focused on STEM reasoning—free-form dialogue and general conversation are secondary
* Complex chaining tasks might require manual prompt engineering
---
## **References**
1. [Qwen2.5 Technical Report (2024)](https://arxiv.org/pdf/2412.15115)
2. [YaRN: Efficient Context Window Extension of Large Language Models](https://arxiv.org/pdf/2309.00071)
3. [open-r1/Mixture-of-Thoughts](https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts)
4. [Open Code Reasoning Dataset](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)