--- library_name: transformers tags: - text-generation-inference - code - grpo - math - RL license: apache-2.0 language: - en base_model: - Qwen/Qwen2.5-1.5B-Instruct pipeline_tag: text-generation --- ![as.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/uMfcuixpsyl3mYbz0YACN.png) # **Lota-Carinae-Open-GRPO** > **Lota-Carinae-Open-GRPO** is a **chain-of-thought reasoning model** fine-tuned from **Qwen-1.5B**, leveraging an advanced reinforcement learning strategy — **Group Relative Policy Optimization (GRPO)**. It is specifically designed for solving **mathematical problems** in both **English** and **Chinese**, combining stepwise reasoning with lightweight efficiency. Ideal for educational tools, math tutoring systems, and logic-intensive assistants. ## **Key Features** 1. **Chain-of-Thought Math Reasoning** Fine-tuned with GRPO to enhance intermediate step generation, **Lota-Carinae-Open-GRPO** enables high interpretability and logical transparency — essential for both learning and verification. 2. **Bilingual Proficiency (English + Chinese)** Fluently understands and explains math problems in **English** and **Simplified Chinese**, serving diverse educational ecosystems and multilingual environments. 3. **Compact yet Intelligent** Despite its **1.5B parameter** size, it achieves strong performance in arithmetic, algebra, geometry, word problems, and logic puzzles, with optimized efficiency via GRPO. 4. **Structured Step-by-Step Computation** Delivers coherent, human-readable step-by-step solutions, making complex problems easier to follow and learn from. ## **Quickstart with Transformers** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/Monoceros-QwenM-1.5B" # (Update with new repo name if applicable) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Solve: A train travels 180 km in 3 hours. What is its average speed?" messages = [ {"role": "system", "content": "You are a helpful tutor skilled in solving math problems with step-by-step explanations."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## **Intended Use** - **Math Tutoring Bots**: Step-by-step assistants for learners from basic to intermediate levels. - **Bilingual Educational Apps**: Math learning in **English** and **Chinese**, improving access and comprehension. - **STEM Reasoning Tools**: Supports science, technology, engineering, and logical thinking tasks. - **RL-Enhanced Lightweight LLMs**: Powered by **GRPO**, suitable for embedded or resource-constrained deployments (mobile, web, or on-device). ## **Limitations** 1. **Domain Focused**: Primarily optimized for mathematical reasoning; general-purpose tasks may yield reduced quality. 2. **Model Scale**: Smaller size means it may not match the depth of larger models for complex or abstract scenarios. 3. **Inherited Biases**: As it builds upon Qwen-1.5B, it may retain pretraining biases—careful use is advised in sensitive contexts. 4. **Prompt Sensitivity**: Structured, math-specific prompts deliver the most accurate results.