初始化项目，由ModelHub XC社区提供模型

Model: FutureMa/Qwen3-8B-Drama-Thinking Source: Original Platform
2026-05-14 05:33:56 +08:00
commit 8f1663e427
20 changed files with 3403 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,354 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen3-8B
+tags:
+  - qwen3
+  - thinking
+  - creative-writing
+  - screenwriting
+  - drama
+  - chain-of-thought
+  - reasoning
+  - ms-swift
+  - full-parameter-finetuning
+datasets:
+  - custom-drama-thinking-dataset
+language:
+  - en
+  - zh
+library_name: transformers
+pipeline_tag: text-generation
+model-index:
+  - name: Qwen3-8B-Drama-Thinking
+    results:
+      - task:
+          type: text-generation
+          name: Creative Script Writing
+        metrics:
+          - type: thinking_depth
+            value: 9.0
+            name: Thinking Depth Score
+          - type: script_format
+            value: 9.0
+            name: Script Format Score
+          - type: dramatic_craft
+            value: 8.5
+            name: Dramatic Craft Score
+---
+
+# Qwen3-8B-Drama-Thinking
+
+This model is a **full parameter fine-tuned** version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on a custom drama thinking dataset with explicit creative reasoning chains.
+
+## Model Description
+
+- **Base Model**: Qwen3-8B (8 billion parameters)
+- **Training Method**: Full Parameter Fine-tuning (NOT LoRA)
+- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift)
+- **Training Data**: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
+- **Specialization**: Screenwriting with explicit `<think>...</think>` creative reasoning
+- **Hardware**: 2x NVIDIA H100 80GB SXM5
+- **Training Time**: 2 hours 46 minutes (3 epochs)
+- **Training Cost**: ~$17.86
+
+## Key Features
+
+### 🎬 Professional Screenwriting Assistant
+
+This model generates dramatic scripts with **explicit creative deliberation**:
+
+- ✅ **Thinking Process Visible**: Uses `<think>...</think>` tags to show internal reasoning
+- ✅ **Deep Character Psychology**: Analyzes motivations, defense mechanisms, subtext
+- ✅ **Structural Planning**: Three-act structure, emotional arcs, pacing decisions
+- ✅ **Visual Storytelling**: Symbolism, atmosphere, cinematographic choices
+- ✅ **Professional Format**: Correct screenplay formatting (scene headers, action lines, dialogue)
+
+### 📊 Performance Comparison
+
+Compared to base Qwen3-8B:
+
+| Metric | Base Model | Fine-Tuned | Improvement |
+|--------|------------|------------|-------------|
+| **Output Length** | 1,071 tokens | 3,874 tokens | **+262%** |
+| **Thinking Depth** | 5/10 | 9/10 | **+80%** |
+| **Creative Reasoning** | 500 tokens | 3,400 tokens | **+580%** |
+| **Craft Analysis** | Generic | Professional | **Qualitative leap** |
+
+### 🎯 Unique Value Proposition
+
+> This is not just a text generator - it's a **creative thinking partner** that externalizes
+> the entire screenwriting process: from title analysis to character psychology to structural
+> planning to final execution.
+
+## Training Details
+
+### Training Configuration
+
+```bash
+Model:              Qwen/Qwen3-8B
+Template:           qwen3_thinking
+Training Type:      Full Parameter (all 8B parameters)
+Max Length:         8192 tokens (for long thinking chains)
+Batch Size:         1 per device × 2 GPUs
+Gradient Accum:     8 steps (effective batch size: 16)
+Learning Rate:      1e-5
+Epochs:             3
+Optimization:       DeepSpeed Zero3 + Gradient Checkpointing
+                    Liger Kernel, BF16 mixed precision
+Loss Scale:         ignore_empty_think
+GPU Memory:         ~74.62 GB per H100 (stable)
+```
+
+### Dataset Characteristics
+
+- **Samples**: 6,319 dramatic script continuations
+- **Average Length**: ~5,000 tokens per sample
+- **Max Length**: ~6,100 tokens
+- **Format**: Conversations with `<think>...</think>` reasoning tags
+- **Content**:
+  - Script opening scenes (title, description, initial dialogue)
+  - Extensive creative deliberation (3,000+ tokens of thinking)
+  - Script continuation with proper formatting
+- **Style**: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)
+
+### Training Metrics
+
+- **Final Loss**: 0.844
+- **Average Loss**: 0.978
+- **Loss Trajectory**: 1.602 (start) → 0.82-0.83 (end)
+- **Training Speed**: ~8 seconds/iteration
+- **Total Steps**: 1,185
+- **Checkpoints**: 5 saved (400, 800, 900, 1000, 1185)
+
+## Usage
+
+### Quick Start (ms-swift)
+
+```bash
+# Install ms-swift
+pip install ms-swift
+
+# Inference (interactive mode)
+swift infer \
+    --ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
+    --template qwen3_thinking \
+    --max_new_tokens 4096 \
+    --temperature 0.7
+```
+
+### Python API
+
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import PtEngine, InferRequest, RequestConfig
+
+# Initialize engine
+engine = PtEngine(
+    model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
+    max_batch_size=1,
+    torch_dtype="bfloat16"
+)
+
+# Create prompt
+prompt = """Title: The Reunion
+Description: Two estranged siblings meet at their childhood home after 20 years.
+
+INT. FAMILY LIVING ROOM - DAY
+
+SARAH (35) stands by the window, looking out at the garden.
+MICHAEL (38) enters, hesitant."""
+
+# Generate (streaming)
+messages = [{'role': 'user', 'content': prompt}]
+request = InferRequest(messages=messages)
+config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)
+
+for response in engine.infer([request], config)[0]:
+    if response:
+        print(response.choices[0].delta.content, end='', flush=True)
+```
+
+### Using with Transformers (requires adapting checkpoint)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained(
+    "FutureMa/Qwen3-8B-Drama-Thinking",
+    torch_dtype="bfloat16",
+    device_map="auto"
+)
+
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
+
+messages = [
+    {"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
+    {"role": "user", "content": "Write a scene about forgiveness..."}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Example Output
+
+**Input Prompt:**
+```
+Title: The Last Apology
+Description: A daughter arrives at her estranged father's deathbed...
+```
+
+**Model Output** (excerpt):
+```
+<think>
+Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
+and I need to figure out exactly where this goes next—not just the next scene, but the next
+emotional beat, the next structural pivot...
+
+First, the title: *The Last Apology*. That's not just poetic—it's structural. It tells me
+this story is about delayed recognition, about the finality of words left unsaid...
+
+[3,400 tokens of deep creative analysis including:]
+- Title deconstruction and thematic implications
+- Character psychology analysis
+- Three-act structure planning
+- Visual language and symbolism
+- Multiple narrative paths considered
+- Professional screenwriting techniques
+</think>
+
+INT. HOSPITAL ROOM - NIGHT
+
+ANNA (28), in a wrinkled business suit, hesitates at the doorway.
+
+DAVID (65) lies in bed, breathing labored...
+
+[Script continues with proper formatting]
+```
+
+## Intended Use
+
+### ✅ Recommended Use Cases
+
+1. **Screenwriting Education**: Learn professional creative thinking process
+2. **Script Ideation**: Generate story frameworks and narrative alternatives
+3. **Story Consulting**: Explore "what if" scenarios with explicit reasoning
+4. **Creative Brainstorming**: Understand decision-making in storytelling
+5. **Draft Development**: Plan structure before execution
+
+### ❌ Not Recommended For
+
+1. **Final Shooting Scripts**: Requires human refinement for production
+2. **Comedy/Action Genres**: Training bias toward dramatic content
+3. **Long-form Series**: Single-pass generation may lack consistency
+4. **Immediate Production**: Dialogue needs naturalization
+
+## Evaluation Results
+
+### Quantitative Metrics (vs. Base Model)
+
+| Aspect | Score | Base Model | Improvement |
+|--------|-------|------------|-------------|
+| **Thinking Depth** | 9/10 | 5/10 | +80% |
+| **Script Format** | 9/10 | 8/10 | +13% |
+| **Dramatic Craft** | 8.5/10 | 8/10 | +6% |
+| **Character Psychology** | 9/10 | 6/10 | +50% |
+| **Decision Transparency** | 9/10 | 5/10 | +80% |
+| **Overall** | 8.1/10 | 6.9/10 | +17% |
+
+> **Note on Methodology:**
+> *These metrics are generated using an **LLM-as-a-Judge** framework (Claude) comparing the fine-tuned model against the base model.
+
+### Qualitative Improvements
+
+- ✅ **Professional Voice**: Sounds like experienced screenwriter
+- ✅ **Structural Thinking**: Explicit three-act planning
+- ✅ **Meta-Awareness**: "This isn't just a script. It's a reckoning."
+- ✅ **Non-Linear Reasoning**: Considers alternatives, backtracks, refines
+- ✅ **Craft-Oriented**: Explains why choices serve the story
+
+## Limitations
+
+1. **Thinking Verbosity**: Generates ~3,400 tokens of thinking (87% of output)
+   - May be excessive for quick tasks
+   - Consider using `max_new_tokens` to limit length
+
+2. **Incomplete Execution**: Token budget consumed by thinking
+   - Many planned scenes not fully generated
+   - May need 6,000-8,000 token limit for complete scripts
+
+3. **Dialogue Naturalness**: More direct/literary than conversational
+   - Training data style influences output
+   - May need post-processing for natural speech
+
+4. **Training Data Bias**: Skews toward melodramatic scenarios
+   - Less suited for subtle/realistic dialogue
+   - Best for emotionally intense stories
+
+## Training Insights
+
+### What Made This Successful
+
+1. **8192 Token Context**: Essential for capturing full thinking chains
+   - Initial assumption of 2048 would have truncated data
+   - Average sample length: ~5,000 tokens
+
+2. **DeepSpeed Zero3**: Required (not optional)
+   - Single H100: Would need ~109-114 GB (OOM)
+   - Zero3 sharding: ~74.62 GB per card ✅
+
+3. **Full Parameter Training**: Worth the cost
+   - Deeper capability transfer than LoRA
+   - Better thinking process internalization
+   - Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
+
+4. **Quality Training Data**: 6,319 long-form reasoning examples
+   - Actual creative process in `<think>` tags
+   - High-quality dramatic writing
+
+## Citation
+
+```bibtex
+@misc{qwen3-drama-thinking-2025,
+  author = {FutureMa},
+  title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
+  note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
+}
+```
+
+## News & Updates
+
+**[2025-12-23]** 🎉 **DramaBench Dataset is now open-source!** Evaluate your drama script generation with our comprehensive 6-dimensional benchmark framework (Format Standards, Narrative Efficiency, Character Consistency, Emotional Depth, Logic Consistency, Conflict Handling).
+- 📊 Dataset: [FutureMa/DramaBench](https://huggingface.co/datasets/FutureMa/DramaBench)
+- 📄 Paper: [arXiv:2512.19012](https://arxiv.org/abs/2512.19012)
+- 🌐 Demo: [dramabench.pages.dev](https://dramabench.pages.dev/)
+
+---
+
+## Acknowledgments
+
+- **Base Model**: [Qwen Team](https://huggingface.co/Qwen) - Qwen3-8B
+- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift) - ModelScope SWIFT
+- **Infrastructure**: [Lambda Cloud](https://lambdalabs.com/) - 2x H100 80GB SXM5
+- **Dataset**: Custom Drama Thinking Dataset (6,319 samples)
+
+## Model Card Contact
+
+For questions or feedback:
+- **HuggingFace**: [@FutureMa](https://huggingface.co/FutureMa)
+- **GitHub Issues**: Report via ms-swift repository
+
+---
+
+**Training Date**: 2025-12-08
+**Training Duration**: 2h 46m
+**Model Size**: ~16GB (BF16 precision)
+**Recommended VRAM**: 16GB+ for inference