初始化项目，由ModelHub XC社区提供模型

Model: ericflo/Llama-3.2-3B-COT Source: Original Platform
2026-04-26 20:54:46 +08:00
commit 29466a5cfb
29 changed files with 7135 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,142 @@
+---
+license: apache-2.0
+base_model:
+- meta-llama/Llama-3.2-3B
+tags:
+- llama-3.2
+- thought-chain
+- instruction-finetuning
+- transformers
+library_name: transformers
+pipeline_tag: text-generation
+---
+
+# Thought-Ranked Llama 3.2 3B
+
+## Model Description
+
+This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach.
+(Weekend project, just a few hundred steps of training)
+
+### Training Process
+
+1. **Initial Generation**: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: `<thought>{char}</thought>` for each character in `[a-zA-Z0-9]`. Each thought chain is allowed up to 128 tokens.
+
+2. **Answer Generation**: Following each thought chain, the model generates a complete answer with up to 2048 tokens.
+
+3. **Ranking & Selection**: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.
+
+4. **Final Training**: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.
+
+### Key Features
+
+- **Thought Chain Generation**: The model has learned to generate explicit thought processes before providing answers
+- **Greedy Sampling**: Uses greedy sampling for both thought generation and final answers
+- **Length Parameters**:
+  - Thought chains: Up to 128 tokens
+  - Final answers: Up to 2048 tokens
+
+### Model Architecture
+
+- Base model: Llama 3.2 3B (Base)
+- Architecture: Transformer-based language model
+- Parameters: ~3.2 billion
+- Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking
+
+## Intended Use
+
+This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:
+- Problem-solving
+- Mathematical reasoning
+- Logical deduction
+- Step-by-step explanations
+- Complex decision making
+
+### Out-of-Scope Uses
+
+- Direct deployment without safety measures
+- Applications requiring guaranteed accuracy
+- Critical decision-making without human oversight
+- Tasks requiring capabilities beyond the base Llama 3.2 3B model
+
+## Training Details
+
+### Training Data
+
+The model was trained using:
+- Sample questions paired with multiple thought variations
+- Thought chains generated using systematic character prefixes
+- Rankings derived from LLM evaluation of answer quality
+
+### Training Procedure
+
+1. **Thought Generation Phase**
+   - Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
+   - Sampled with temperature=0.0
+   - Maximum thought length: 128 tokens
+
+2. **Answer Generation Phase**
+   - Generated completions following each thought chain
+   - Maximum answer length: 2048 tokens
+   - Sampled with temperature=0.0
+
+3. **Ranking Phase**
+   - External LLM evaluated answer quality
+   - Ranking performed without access to thought chains
+   - Selected highest-performing thought-answer pairs
+
+4. **Final Training Phase**
+   - Fine-tuned on best-performing thought-answer combinations
+   - 4 complete rounds of training
+
+## Usage
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
+tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")
+
+# Example usage
+prompt = "Solve this math problem: 2x + 3 = 7"
+input_ids = tokenizer.apply_chat_template(
+  [{"role": "user", "content": prompt}],
+  return_tensors="pt"
+)
+
+# Generate response with thought chain
+output = model.generate(
+    input_ids,
+    temperature=1.0,
+)
+
+response = tokenizer.decode(output[0])
+```
+
+## Limitations
+
+- Limited to the capabilities of the base Llama 3.2 3B model
+- May generate thought chains that are not always optimal
+- Performance depends on the quality of the LLM ranking system used during training
+- Training process may not capture all possible effective thought patterns
+- Limited by the context window of the base model
+
+## Ethical Considerations
+
+- The model inherits biases from the base Llama 3.2 3B model
+- Generated thought chains should be reviewed for accuracy and appropriateness
+- The model's reasoning process should not be relied upon for critical decisions without human verification
+- Users should implement appropriate content filtering and safety measures
+
+## Citation
+
+If you use this model in your research, please cite:
+
+```bibtex
+@misc{thought-ranked-llama,
+  title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
+  author={[Eric Florenzano]},
+  year={2024},
+  howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
+}
+```