Files
ModelHub XC 6e000433a6 初始化项目,由ModelHub XC社区提供模型
Model: prithivMLmods/Cetus-Qwen3_4B-GeneralThought
Source: Original Platform
2026-05-19 11:50:41 +08:00

113 lines
3.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
datasets:
- GeneralReasoning/GeneralThought-430K
base_model:
- prithivMLmods/Qwen3-4B-ft-bf16
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- moe
- text-generation-inference
- code
- math
---
![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KsOstOMOTnO7oWVdPycA3.png)
# Cetus-Qwen3\_4B-GeneralThought
> Cetus-Qwen3\_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.
> [!note]
[ GGUF ] : https://huggingface.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought-Q4_K_M-GGUF
## Key Features
1. Broad Reasoning with GeneralThought-430K
Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:
* Mathematical and logical reasoning
* Scientific and factual QA
* Multistep instructions and problem decomposition
* Abstract and applied reasoning tasks
2. Multi-Domain Task Versatility
Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.
3. Structured Output Control
Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.
4. Instruction-Following with Multi-Step Fidelity
Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.
5. Multilingual and Cross-Cultural Understanding
Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.
6. Efficient Qwen3-4B Base
Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.
## Quickstart with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the concept of entropy in thermodynamics in simple terms."
messages = [
{"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
## Intended Use
* General reasoning and educational Q\&A
* Technical concept explanation and summarization
* Structured content generation in Markdown, LaTeX, and JSON
* Code and logic support in instruction-rich workflows
* Multi-language academic and public knowledge tools
## Limitations
* Not optimized for purely creative or fictional content
* Smaller context window compared to frontier models
* May be sensitive to ambiguous or poorly structured prompts
* Can occasionally hallucinate in niche or adversarial scenarios
## References
1. Qwen2.5 Technical Report [https://arxiv.org/pdf/2412.15115](https://arxiv.org/pdf/2412.15115)
2. YaRN: Context Window Extension [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
3. GeneralThought-430K Dataset (internal/prepublication dataset source, if applicable)