初始化项目，由ModelHub XC社区提供模型

Model: arcee-ai/Meraj-Mini Source: Original Platform
2026-05-15 09:28:54 +08:00
commit d556f6dd94
15 changed files with 151871 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,40 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,134 @@
+---
+license: apache-2.0
+language:
+- ar
+- en
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+pipeline_tag: text2text-generation
+library_name: transformers
+tags:
+- qwen
+- text-generation-inference
+---
+
+<div align="center">
+  <img src="https://i.ibb.co/CmPSSpq/Screenshot-2024-10-06-at-9-45-06-PM.png" alt="Arcee Meraj Mini" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
+</div>
+
+Following the release of [Arcee Meraj](https://meraj.arcee.ai/), our enterprise's globally top-performing Arabic LLM, we are thrilled to unveil Arcee Meraj Mini. This open-source model, meticulously fine-tuned from [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), is expertly designed for both Arabic and English. This model has undergone rigorous evaluation across multiple benchmarks in both languages, demonstrating top-tier performance in Arabic and competitive results in English. Arcee Meraj Mini’s primary objective is to enhance Arabic capabilities while maintaining robust English language proficiency. Benchmark results confirm that Arcee Meraj Mini excels in Arabic, with English performance comparable to leading models — perfectly aligning with our vision for balanced bilingual strength.
+
+## Technical Details
+Below is an overview of the key stages in Meraj Mini’s development:
+
+1. **Data Preparation:** We filter candidate samples from diverse English and Arabic sources to ensure high-quality data. Some of the selected English datasets are translated into Arabic to increase the quantity of Arabic samples and improve the model’s quality in bilingual performance. Then, new [Direct Preference Optimization (DPO)](https://arxiv.org/pdf/2305.18290) datasets are continuously prepared, filtered, and translated to maintain a fresh and diverse dataset that supports better generalization across domains.
+2. **Initial Training:** We train the Qwen2.5 model with 7 billion parameters using these high-quality datasets in both languages. This allows the model to handle diverse linguistic patterns from over 500 million tokens, ensuring strong performance in Arabic and English tasks.
+3. **Iterative Training and Post-Training:** Iterative training and post-training iterations refine the model, enhancing its accuracy and adaptability to ensure it can perform well across varied tasks and language contexts.
+4. **Evaluation:** Arcee Meraj Mini is based on training and evaluating 15 different variants to explore optimal configurations, with assessments done on both Arabic and English benchmarks and leaderboards. This step ensures the model is robust in handling both general and domain-specific tasks.
+5. **Final Model Creation:** We select the best-performing variant and use the [MergeKit](https://arxiv.org/pdf/2403.13257) library to merge the configurations, resulting in the final Arcee Meraj Mini model. This model is not only optimized for language understanding but also serves as a starting point for domain adaptation in different areas.
+
+With this process, Arcee Meraj Mini is crafted to be more than just a general-purpose language model—it’s an adaptable tool, ready to be fine-tuned for specific industries and applications, empowering users to extend its capabilities for domain-specific tasks.
+## Capabilities and Use Cases
+
+Arcee Meraj Mini is capable of solving a wide range of language tasks, including the tasks as below:
+
+1. **Arabic Language Understanding**: Arcee Meraj Mini excels in general language comprehension, reading comprehension, and common-sense reasoning, all tailored to the Arabic language, providing strong performance in a variety of linguistic tasks.
+
+2. **Cultural Adaptation**: The model ensures content creation that goes beyond linguistic accuracy, incorporating cultural nuances to align with Arabic norms and values, making it suitable for culturally relevant applications.
+
+3. **Education**: It enables personalized, adaptive learning experiences for Arabic speakers by generating high-quality educational content across diverse subjects, enhancing the overall learning journey.
+
+4. **Mathematics and Coding**: With robust support for mathematical reasoning and problem-solving, as well as code generation in Arabic, Arcee Meraj Mini serves as a valuable tool for developers and professionals in technical fields.
+
+5. **Customer Service**: The model facilitates the development of advanced Arabic-speaking chatbots and virtual assistants, capable of managing customer queries with a high degree of natural language understanding and precision.
+
+6. **Content Creation**: Arcee Meraj Mini generates high-quality Arabic content for various needs, from marketing materials and technical documentation to creative writing, ensuring impactful communication and engagement in the Arabic-speaking world.
+
+## Quantized GGUF
+
+Here are GGUF models:
+- [Meraj-Mini-GGUF](https://huggingface.co/MaziyarPanahi/Meraj-Mini-GGUF)
+
+
+## How to
+This model uses ChatML prompt template:
+
+```
+<|im_start|>system
+{System}
+<|im_end|>
+<|im_start|>user
+{User}
+<|im_end|>
+<|im_start|>assistant
+{Assistant}
+```
+
+```python
+# Use a pipeline as a high-level helper
+
+from transformers import pipeline
+
+messages = [
+    {"role": "user", "content": "مرحبا، كيف حالك؟"},
+]
+pipe = pipeline("text-generation", model="arcee-ai/Meraj-Mini")
+pipe(messages)
+
+
+# Load model directly
+
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Meraj-Mini")
+model = AutoModelForCausalLM.from_pretrained("arcee-ai/Meraj-Mini")
+```
+
+## Evaluations
+####  Open Arabic LLM Leaderboard (OALL) Benchmarks
+Arcee Meraj Mini model consistently outperforms state-of-the-art models on most of the Open Arabic LLM Leaderboard (OALL) benchmarks, highlighting its improvements and effectiveness in Arabic language content, and securing the top performing position on average among the other models.
+<div align="center">
+  <img src="https://i.ibb.co/LQ0z7fH/Screenshot-2024-10-15-at-2-53-45-PM.png" alt="Arcee Meraj Mini Open Arabic LLM Leaderboard (OALL) - table 1" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+<div align="center">
+  <img src="https://i.ibb.co/fM6VQR7/Screenshot-2024-10-15-at-2-53-55-PM.png" alt="Arcee Meraj Mini Open Arabic LLM Leaderboard (OALL) - table 2" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+#### Translated MMLU
+We focused on the multilingual MMLU dataset, as distributed through the LM Evaluation Harness repository, to compare the multilingual strength of different models for this benchmark. Arcee Meraj Mini outperforms the other models, showcasing these models’ superior performance compared to the other state-of-the-art models.
+<div align="center">
+  <img src="https://i.ibb.co/dfwW1W5/W-B-Chart-10-15-2024-2-07-12-PM.png" alt="Arcee Meraj Mini Trnalsated MMLU" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+#### English Benchmarks:
+Arcee Meraj Mini performs comparably to state-of-the-art models, demonstrating how the model retains its English language knowledge and capabilities while learning Arabic.
+
+<div align="center">
+  <img src="https://i.ibb.co/mTcLFzt/W-B-Chart-10-15-2024-2-15-57-PM.png" alt="Arcee Meraj Mini Winogrande" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+<div align="center">
+  <img src="https://i.ibb.co/GRBjjGN/W-B-Chart-10-15-2024-2-17-34-PM.png" alt="Arcee Meraj Mini Arc Challenge" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+<div align="center">
+  <img src="https://i.ibb.co/98s0qTf/W-B-Chart-10-15-2024-2-17-46-PM.png" alt="Arcee Meraj Mini TruthfulQA" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+<div align="center">
+  <img src="https://i.ibb.co/yqvRK3L/W-B-Chart-10-15-2024-2-17-57-PM.png" alt="Arcee Meraj Mini GSM8K" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 80%; height: auto;">
+</div>
+
+## Model Usage
+
+For a detailed explanation of the model's capabilities, architecture, and applications, please refer to our blog post: https://blog.arcee.ai/arcee-meraj-mini-2/
+
+To test the model directly, you can try it out using this Google Colab notebook: https://colab.research.google.com/drive/1hXXyNM-X0eKwlZ5OwqhZfO0U8CBq8pFO?usp=sharing
+
+## Acknowledgements
+
+We are grateful to the open-source AI community for their continuous contributions and to the Qwen team for their foundational efforts on the Qwen2.5 model series.
+
+## Future Directions
+
+As we release the Arcee Meraj Mini to the public, we invite researchers, developers, and businesses to engage with the Arcee Meraj Mini model, particularly in enhancing support for the Arabic language and fostering domain adaptation. We are committed to advancing open-source AI technology and invite the community to explore, contribute, and build upon Arcee Meraj Mini.
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,24 @@
+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
+{
+  "_name_or_path": "arcee-train/Arcee-Qwwen2.5-English-Arabic",
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 3584,
+  "initializer_range": 0.02,
+  "intermediate_size": 18944,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 28,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 4,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.45.1",
+  "use_cache": false,
+  "use_mrope": false,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text2text-generation", "allow_remote": true}
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e2bb743e0916cf2e57ab7b321f22f7332f465a8060f87b430adb7283614797b8
+size 4976698776
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:53f2b4abe66fd8be4736f5a13a52e42b2c79c0b7fb9877f92b68d7124b858cb4
+size 4932751032
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:287e987cf95374c484c83b8de49746768a6fd17cc26f8915206b1883cab48702
+size 4991495808
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9e366ae6d76edb8e349d93efbf75c8fcba4808a61dfa34865addfa238d5e94a6
+size 330326240
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,207 @@
+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}
--- a/vocab.json
+++ b/vocab.json
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text2text-generation", "allow_remote": true}`