Files
Qwen3-8B-Drama-Thinking/README.md
ModelHub XC 8f1663e427 初始化项目,由ModelHub XC社区提供模型
Model: FutureMa/Qwen3-8B-Drama-Thinking
Source: Original Platform
2026-05-14 05:33:56 +08:00

354 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
- qwen3
- thinking
- creative-writing
- screenwriting
- drama
- chain-of-thought
- reasoning
- ms-swift
- full-parameter-finetuning
datasets:
- custom-drama-thinking-dataset
language:
- en
- zh
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: Qwen3-8B-Drama-Thinking
results:
- task:
type: text-generation
name: Creative Script Writing
metrics:
- type: thinking_depth
value: 9.0
name: Thinking Depth Score
- type: script_format
value: 9.0
name: Script Format Score
- type: dramatic_craft
value: 8.5
name: Dramatic Craft Score
---
# Qwen3-8B-Drama-Thinking
This model is a **full parameter fine-tuned** version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on a custom drama thinking dataset with explicit creative reasoning chains.
## Model Description
- **Base Model**: Qwen3-8B (8 billion parameters)
- **Training Method**: Full Parameter Fine-tuning (NOT LoRA)
- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift)
- **Training Data**: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
- **Specialization**: Screenwriting with explicit `<think>...</think>` creative reasoning
- **Hardware**: 2x NVIDIA H100 80GB SXM5
- **Training Time**: 2 hours 46 minutes (3 epochs)
- **Training Cost**: ~$17.86
## Key Features
### 🎬 Professional Screenwriting Assistant
This model generates dramatic scripts with **explicit creative deliberation**:
-**Thinking Process Visible**: Uses `<think>...</think>` tags to show internal reasoning
-**Deep Character Psychology**: Analyzes motivations, defense mechanisms, subtext
-**Structural Planning**: Three-act structure, emotional arcs, pacing decisions
-**Visual Storytelling**: Symbolism, atmosphere, cinematographic choices
-**Professional Format**: Correct screenplay formatting (scene headers, action lines, dialogue)
### 📊 Performance Comparison
Compared to base Qwen3-8B:
| Metric | Base Model | Fine-Tuned | Improvement |
|--------|------------|------------|-------------|
| **Output Length** | 1,071 tokens | 3,874 tokens | **+262%** |
| **Thinking Depth** | 5/10 | 9/10 | **+80%** |
| **Creative Reasoning** | 500 tokens | 3,400 tokens | **+580%** |
| **Craft Analysis** | Generic | Professional | **Qualitative leap** |
### 🎯 Unique Value Proposition
> This is not just a text generator - it's a **creative thinking partner** that externalizes
> the entire screenwriting process: from title analysis to character psychology to structural
> planning to final execution.
## Training Details
### Training Configuration
```bash
Model: Qwen/Qwen3-8B
Template: qwen3_thinking
Training Type: Full Parameter (all 8B parameters)
Max Length: 8192 tokens (for long thinking chains)
Batch Size: 1 per device × 2 GPUs
Gradient Accum: 8 steps (effective batch size: 16)
Learning Rate: 1e-5
Epochs: 3
Optimization: DeepSpeed Zero3 + Gradient Checkpointing
Liger Kernel, BF16 mixed precision
Loss Scale: ignore_empty_think
GPU Memory: ~74.62 GB per H100 (stable)
```
### Dataset Characteristics
- **Samples**: 6,319 dramatic script continuations
- **Average Length**: ~5,000 tokens per sample
- **Max Length**: ~6,100 tokens
- **Format**: Conversations with `<think>...</think>` reasoning tags
- **Content**:
- Script opening scenes (title, description, initial dialogue)
- Extensive creative deliberation (3,000+ tokens of thinking)
- Script continuation with proper formatting
- **Style**: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)
### Training Metrics
- **Final Loss**: 0.844
- **Average Loss**: 0.978
- **Loss Trajectory**: 1.602 (start) → 0.82-0.83 (end)
- **Training Speed**: ~8 seconds/iteration
- **Total Steps**: 1,185
- **Checkpoints**: 5 saved (400, 800, 900, 1000, 1185)
## Usage
### Quick Start (ms-swift)
```bash
# Install ms-swift
pip install ms-swift
# Inference (interactive mode)
swift infer \
--ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
--template qwen3_thinking \
--max_new_tokens 4096 \
--temperature 0.7
```
### Python API
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import PtEngine, InferRequest, RequestConfig
# Initialize engine
engine = PtEngine(
model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
max_batch_size=1,
torch_dtype="bfloat16"
)
# Create prompt
prompt = """Title: The Reunion
Description: Two estranged siblings meet at their childhood home after 20 years.
INT. FAMILY LIVING ROOM - DAY
SARAH (35) stands by the window, looking out at the garden.
MICHAEL (38) enters, hesitant."""
# Generate (streaming)
messages = [{'role': 'user', 'content': prompt}]
request = InferRequest(messages=messages)
config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)
for response in engine.infer([request], config)[0]:
if response:
print(response.choices[0].delta.content, end='', flush=True)
```
### Using with Transformers (requires adapting checkpoint)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"FutureMa/Qwen3-8B-Drama-Thinking",
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
messages = [
{"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
{"role": "user", "content": "Write a scene about forgiveness..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Example Output
**Input Prompt:**
```
Title: The Last Apology
Description: A daughter arrives at her estranged father's deathbed...
```
**Model Output** (excerpt):
```
<think>
Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
and I need to figure out exactly where this goes next—not just the next scene, but the next
emotional beat, the next structural pivot...
First, the title: *The Last Apology*. That's not just poetic—it's structural. It tells me
this story is about delayed recognition, about the finality of words left unsaid...
[3,400 tokens of deep creative analysis including:]
- Title deconstruction and thematic implications
- Character psychology analysis
- Three-act structure planning
- Visual language and symbolism
- Multiple narrative paths considered
- Professional screenwriting techniques
</think>
INT. HOSPITAL ROOM - NIGHT
ANNA (28), in a wrinkled business suit, hesitates at the doorway.
DAVID (65) lies in bed, breathing labored...
[Script continues with proper formatting]
```
## Intended Use
### ✅ Recommended Use Cases
1. **Screenwriting Education**: Learn professional creative thinking process
2. **Script Ideation**: Generate story frameworks and narrative alternatives
3. **Story Consulting**: Explore "what if" scenarios with explicit reasoning
4. **Creative Brainstorming**: Understand decision-making in storytelling
5. **Draft Development**: Plan structure before execution
### ❌ Not Recommended For
1. **Final Shooting Scripts**: Requires human refinement for production
2. **Comedy/Action Genres**: Training bias toward dramatic content
3. **Long-form Series**: Single-pass generation may lack consistency
4. **Immediate Production**: Dialogue needs naturalization
## Evaluation Results
### Quantitative Metrics (vs. Base Model)
| Aspect | Score | Base Model | Improvement |
|--------|-------|------------|-------------|
| **Thinking Depth** | 9/10 | 5/10 | +80% |
| **Script Format** | 9/10 | 8/10 | +13% |
| **Dramatic Craft** | 8.5/10 | 8/10 | +6% |
| **Character Psychology** | 9/10 | 6/10 | +50% |
| **Decision Transparency** | 9/10 | 5/10 | +80% |
| **Overall** | 8.1/10 | 6.9/10 | +17% |
> **Note on Methodology:**
> *These metrics are generated using an **LLM-as-a-Judge** framework (Claude) comparing the fine-tuned model against the base model.
### Qualitative Improvements
-**Professional Voice**: Sounds like experienced screenwriter
-**Structural Thinking**: Explicit three-act planning
-**Meta-Awareness**: "This isn't just a script. It's a reckoning."
-**Non-Linear Reasoning**: Considers alternatives, backtracks, refines
-**Craft-Oriented**: Explains why choices serve the story
## Limitations
1. **Thinking Verbosity**: Generates ~3,400 tokens of thinking (87% of output)
- May be excessive for quick tasks
- Consider using `max_new_tokens` to limit length
2. **Incomplete Execution**: Token budget consumed by thinking
- Many planned scenes not fully generated
- May need 6,000-8,000 token limit for complete scripts
3. **Dialogue Naturalness**: More direct/literary than conversational
- Training data style influences output
- May need post-processing for natural speech
4. **Training Data Bias**: Skews toward melodramatic scenarios
- Less suited for subtle/realistic dialogue
- Best for emotionally intense stories
## Training Insights
### What Made This Successful
1. **8192 Token Context**: Essential for capturing full thinking chains
- Initial assumption of 2048 would have truncated data
- Average sample length: ~5,000 tokens
2. **DeepSpeed Zero3**: Required (not optional)
- Single H100: Would need ~109-114 GB (OOM)
- Zero3 sharding: ~74.62 GB per card ✅
3. **Full Parameter Training**: Worth the cost
- Deeper capability transfer than LoRA
- Better thinking process internalization
- Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
4. **Quality Training Data**: 6,319 long-form reasoning examples
- Actual creative process in `<think>` tags
- High-quality dramatic writing
## Citation
```bibtex
@misc{qwen3-drama-thinking-2025,
author = {FutureMa},
title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
}
```
## News & Updates
**[2025-12-23]** 🎉 **DramaBench Dataset is now open-source!** Evaluate your drama script generation with our comprehensive 6-dimensional benchmark framework (Format Standards, Narrative Efficiency, Character Consistency, Emotional Depth, Logic Consistency, Conflict Handling).
- 📊 Dataset: [FutureMa/DramaBench](https://huggingface.co/datasets/FutureMa/DramaBench)
- 📄 Paper: [arXiv:2512.19012](https://arxiv.org/abs/2512.19012)
- 🌐 Demo: [dramabench.pages.dev](https://dramabench.pages.dev/)
---
## Acknowledgments
- **Base Model**: [Qwen Team](https://huggingface.co/Qwen) - Qwen3-8B
- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift) - ModelScope SWIFT
- **Infrastructure**: [Lambda Cloud](https://lambdalabs.com/) - 2x H100 80GB SXM5
- **Dataset**: Custom Drama Thinking Dataset (6,319 samples)
## Model Card Contact
For questions or feedback:
- **HuggingFace**: [@FutureMa](https://huggingface.co/FutureMa)
- **GitHub Issues**: Report via ms-swift repository
---
**Training Date**: 2025-12-08
**Training Duration**: 2h 46m
**Model Size**: ~16GB (BF16 precision)
**Recommended VRAM**: 16GB+ for inference