初始化项目，由ModelHub XC社区提供模型

Model: prithivMLmods/Cetus-Qwen3_4B-GeneralThought Source: Original Platform
2026-05-19 11:50:41 +08:00
commit 6e000433a6
14 changed files with 152310 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,113 @@
+---
+license: apache-2.0
+datasets:
+- GeneralReasoning/GeneralThought-430K
+base_model:
+- prithivMLmods/Qwen3-4B-ft-bf16
+language:
+- en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- moe
+- text-generation-inference
+- code
+- math
+---
+
+![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/KsOstOMOTnO7oWVdPycA3.png)
+
+# Cetus-Qwen3\_4B-GeneralThought
+
+> Cetus-Qwen3\_4B-GeneralThought is a fine-tuned variant of the Qwen3-4B architecture, trained on the GeneralThought-430K dataset to enhance broad-spectrum reasoning, logical coherence, and structured multi-domain problem solving. This model is optimized for general-purpose tasks including instruction following, technical question answering, and reasoning-based generation across diverse knowledge fields.
+
+
+> [!note]
+[ GGUF ] : https://huggingface.co/prithivMLmods/Cetus-Qwen3_4B-GeneralThought-Q4_K_M-GGUF
+
+## Key Features
+
+1. Broad Reasoning with GeneralThought-430K
+   Trained on a carefully curated 430,000-sample dataset—GeneralThought-430K—spanning:
+
+   * Mathematical and logical reasoning
+   * Scientific and factual QA
+   * Multistep instructions and problem decomposition
+   * Abstract and applied reasoning tasks
+
+2. Multi-Domain Task Versatility
+   Equipped to handle use cases across STEM, humanities, code reasoning, and general knowledge workflows with consistency and structure.
+
+3. Structured Output Control
+   Outputs well-formatted answers in Markdown, LaTeX, JSON, and tabular formats, suitable for documentation, education, and technical reporting.
+
+4. Instruction-Following with Multi-Step Fidelity
+   Capable of following detailed prompts involving layered reasoning or procedural guidance with high step-to-step coherence.
+
+5. Multilingual and Cross-Cultural Understanding
+   Supports over 20 languages for global comprehension tasks and technical translation in education and public sector applications.
+
+6. Efficient Qwen3-4B Base
+   Offers an optimal balance between intelligence and computational efficiency—ideal for deployment on consumer-grade GPUs and scalable services.
+
+## Quickstart with Transformers
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "prithivMLmods/Cetus-Qwen3_4B-GeneralThought"
+
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+
+prompt = "Explain the concept of entropy in thermodynamics in simple terms."
+
+messages = [
+    {"role": "system", "content": "You are a general-purpose reasoning assistant trained on GeneralThought-430K."},
+    {"role": "user", "content": prompt}
+]
+
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+
+## Intended Use
+
+* General reasoning and educational Q\&A
+* Technical concept explanation and summarization
+* Structured content generation in Markdown, LaTeX, and JSON
+* Code and logic support in instruction-rich workflows
+* Multi-language academic and public knowledge tools
+
+## Limitations
+
+* Not optimized for purely creative or fictional content
+* Smaller context window compared to frontier models
+* May be sensitive to ambiguous or poorly structured prompts
+* Can occasionally hallucinate in niche or adversarial scenarios
+
+## References
+
+1. Qwen2.5 Technical Report – [https://arxiv.org/pdf/2412.15115](https://arxiv.org/pdf/2412.15115)
+2. YaRN: Context Window Extension – [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
+3. GeneralThought-430K Dataset – (internal/prepublication dataset source, if applicable)