354 lines
12 KiB
Markdown
354 lines
12 KiB
Markdown
|
|
---
|
|||
|
|
license: apache-2.0
|
|||
|
|
base_model: Qwen/Qwen3-8B
|
|||
|
|
tags:
|
|||
|
|
- qwen3
|
|||
|
|
- thinking
|
|||
|
|
- creative-writing
|
|||
|
|
- screenwriting
|
|||
|
|
- drama
|
|||
|
|
- chain-of-thought
|
|||
|
|
- reasoning
|
|||
|
|
- ms-swift
|
|||
|
|
- full-parameter-finetuning
|
|||
|
|
datasets:
|
|||
|
|
- custom-drama-thinking-dataset
|
|||
|
|
language:
|
|||
|
|
- en
|
|||
|
|
- zh
|
|||
|
|
library_name: transformers
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
model-index:
|
|||
|
|
- name: Qwen3-8B-Drama-Thinking
|
|||
|
|
results:
|
|||
|
|
- task:
|
|||
|
|
type: text-generation
|
|||
|
|
name: Creative Script Writing
|
|||
|
|
metrics:
|
|||
|
|
- type: thinking_depth
|
|||
|
|
value: 9.0
|
|||
|
|
name: Thinking Depth Score
|
|||
|
|
- type: script_format
|
|||
|
|
value: 9.0
|
|||
|
|
name: Script Format Score
|
|||
|
|
- type: dramatic_craft
|
|||
|
|
value: 8.5
|
|||
|
|
name: Dramatic Craft Score
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Qwen3-8B-Drama-Thinking
|
|||
|
|
|
|||
|
|
This model is a **full parameter fine-tuned** version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) on a custom drama thinking dataset with explicit creative reasoning chains.
|
|||
|
|
|
|||
|
|
## Model Description
|
|||
|
|
|
|||
|
|
- **Base Model**: Qwen3-8B (8 billion parameters)
|
|||
|
|
- **Training Method**: Full Parameter Fine-tuning (NOT LoRA)
|
|||
|
|
- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift)
|
|||
|
|
- **Training Data**: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
|
|||
|
|
- **Specialization**: Screenwriting with explicit `<think>...</think>` creative reasoning
|
|||
|
|
- **Hardware**: 2x NVIDIA H100 80GB SXM5
|
|||
|
|
- **Training Time**: 2 hours 46 minutes (3 epochs)
|
|||
|
|
- **Training Cost**: ~$17.86
|
|||
|
|
|
|||
|
|
## Key Features
|
|||
|
|
|
|||
|
|
### 🎬 Professional Screenwriting Assistant
|
|||
|
|
|
|||
|
|
This model generates dramatic scripts with **explicit creative deliberation**:
|
|||
|
|
|
|||
|
|
- ✅ **Thinking Process Visible**: Uses `<think>...</think>` tags to show internal reasoning
|
|||
|
|
- ✅ **Deep Character Psychology**: Analyzes motivations, defense mechanisms, subtext
|
|||
|
|
- ✅ **Structural Planning**: Three-act structure, emotional arcs, pacing decisions
|
|||
|
|
- ✅ **Visual Storytelling**: Symbolism, atmosphere, cinematographic choices
|
|||
|
|
- ✅ **Professional Format**: Correct screenplay formatting (scene headers, action lines, dialogue)
|
|||
|
|
|
|||
|
|
### 📊 Performance Comparison
|
|||
|
|
|
|||
|
|
Compared to base Qwen3-8B:
|
|||
|
|
|
|||
|
|
| Metric | Base Model | Fine-Tuned | Improvement |
|
|||
|
|
|--------|------------|------------|-------------|
|
|||
|
|
| **Output Length** | 1,071 tokens | 3,874 tokens | **+262%** |
|
|||
|
|
| **Thinking Depth** | 5/10 | 9/10 | **+80%** |
|
|||
|
|
| **Creative Reasoning** | 500 tokens | 3,400 tokens | **+580%** |
|
|||
|
|
| **Craft Analysis** | Generic | Professional | **Qualitative leap** |
|
|||
|
|
|
|||
|
|
### 🎯 Unique Value Proposition
|
|||
|
|
|
|||
|
|
> This is not just a text generator - it's a **creative thinking partner** that externalizes
|
|||
|
|
> the entire screenwriting process: from title analysis to character psychology to structural
|
|||
|
|
> planning to final execution.
|
|||
|
|
|
|||
|
|
## Training Details
|
|||
|
|
|
|||
|
|
### Training Configuration
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
Model: Qwen/Qwen3-8B
|
|||
|
|
Template: qwen3_thinking
|
|||
|
|
Training Type: Full Parameter (all 8B parameters)
|
|||
|
|
Max Length: 8192 tokens (for long thinking chains)
|
|||
|
|
Batch Size: 1 per device × 2 GPUs
|
|||
|
|
Gradient Accum: 8 steps (effective batch size: 16)
|
|||
|
|
Learning Rate: 1e-5
|
|||
|
|
Epochs: 3
|
|||
|
|
Optimization: DeepSpeed Zero3 + Gradient Checkpointing
|
|||
|
|
Liger Kernel, BF16 mixed precision
|
|||
|
|
Loss Scale: ignore_empty_think
|
|||
|
|
GPU Memory: ~74.62 GB per H100 (stable)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Dataset Characteristics
|
|||
|
|
|
|||
|
|
- **Samples**: 6,319 dramatic script continuations
|
|||
|
|
- **Average Length**: ~5,000 tokens per sample
|
|||
|
|
- **Max Length**: ~6,100 tokens
|
|||
|
|
- **Format**: Conversations with `<think>...</think>` reasoning tags
|
|||
|
|
- **Content**:
|
|||
|
|
- Script opening scenes (title, description, initial dialogue)
|
|||
|
|
- Extensive creative deliberation (3,000+ tokens of thinking)
|
|||
|
|
- Script continuation with proper formatting
|
|||
|
|
- **Style**: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)
|
|||
|
|
|
|||
|
|
### Training Metrics
|
|||
|
|
|
|||
|
|
- **Final Loss**: 0.844
|
|||
|
|
- **Average Loss**: 0.978
|
|||
|
|
- **Loss Trajectory**: 1.602 (start) → 0.82-0.83 (end)
|
|||
|
|
- **Training Speed**: ~8 seconds/iteration
|
|||
|
|
- **Total Steps**: 1,185
|
|||
|
|
- **Checkpoints**: 5 saved (400, 800, 900, 1000, 1185)
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
### Quick Start (ms-swift)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Install ms-swift
|
|||
|
|
pip install ms-swift
|
|||
|
|
|
|||
|
|
# Inference (interactive mode)
|
|||
|
|
swift infer \
|
|||
|
|
--ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
|
|||
|
|
--template qwen3_thinking \
|
|||
|
|
--max_new_tokens 4096 \
|
|||
|
|
--temperature 0.7
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Python API
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import os
|
|||
|
|
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
|
|||
|
|
|
|||
|
|
from swift.llm import PtEngine, InferRequest, RequestConfig
|
|||
|
|
|
|||
|
|
# Initialize engine
|
|||
|
|
engine = PtEngine(
|
|||
|
|
model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
|
|||
|
|
max_batch_size=1,
|
|||
|
|
torch_dtype="bfloat16"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Create prompt
|
|||
|
|
prompt = """Title: The Reunion
|
|||
|
|
Description: Two estranged siblings meet at their childhood home after 20 years.
|
|||
|
|
|
|||
|
|
INT. FAMILY LIVING ROOM - DAY
|
|||
|
|
|
|||
|
|
SARAH (35) stands by the window, looking out at the garden.
|
|||
|
|
MICHAEL (38) enters, hesitant."""
|
|||
|
|
|
|||
|
|
# Generate (streaming)
|
|||
|
|
messages = [{'role': 'user', 'content': prompt}]
|
|||
|
|
request = InferRequest(messages=messages)
|
|||
|
|
config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)
|
|||
|
|
|
|||
|
|
for response in engine.infer([request], config)[0]:
|
|||
|
|
if response:
|
|||
|
|
print(response.choices[0].delta.content, end='', flush=True)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Using with Transformers (requires adapting checkpoint)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
"FutureMa/Qwen3-8B-Drama-Thinking",
|
|||
|
|
torch_dtype="bfloat16",
|
|||
|
|
device_map="auto"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
|
|||
|
|
|
|||
|
|
messages = [
|
|||
|
|
{"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
|
|||
|
|
{"role": "user", "content": "Write a scene about forgiveness..."}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|||
|
|
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
|
|||
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Example Output
|
|||
|
|
|
|||
|
|
**Input Prompt:**
|
|||
|
|
```
|
|||
|
|
Title: The Last Apology
|
|||
|
|
Description: A daughter arrives at her estranged father's deathbed...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Model Output** (excerpt):
|
|||
|
|
```
|
|||
|
|
<think>
|
|||
|
|
Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
|
|||
|
|
and I need to figure out exactly where this goes next—not just the next scene, but the next
|
|||
|
|
emotional beat, the next structural pivot...
|
|||
|
|
|
|||
|
|
First, the title: *The Last Apology*. That's not just poetic—it's structural. It tells me
|
|||
|
|
this story is about delayed recognition, about the finality of words left unsaid...
|
|||
|
|
|
|||
|
|
[3,400 tokens of deep creative analysis including:]
|
|||
|
|
- Title deconstruction and thematic implications
|
|||
|
|
- Character psychology analysis
|
|||
|
|
- Three-act structure planning
|
|||
|
|
- Visual language and symbolism
|
|||
|
|
- Multiple narrative paths considered
|
|||
|
|
- Professional screenwriting techniques
|
|||
|
|
</think>
|
|||
|
|
|
|||
|
|
INT. HOSPITAL ROOM - NIGHT
|
|||
|
|
|
|||
|
|
ANNA (28), in a wrinkled business suit, hesitates at the doorway.
|
|||
|
|
|
|||
|
|
DAVID (65) lies in bed, breathing labored...
|
|||
|
|
|
|||
|
|
[Script continues with proper formatting]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Intended Use
|
|||
|
|
|
|||
|
|
### ✅ Recommended Use Cases
|
|||
|
|
|
|||
|
|
1. **Screenwriting Education**: Learn professional creative thinking process
|
|||
|
|
2. **Script Ideation**: Generate story frameworks and narrative alternatives
|
|||
|
|
3. **Story Consulting**: Explore "what if" scenarios with explicit reasoning
|
|||
|
|
4. **Creative Brainstorming**: Understand decision-making in storytelling
|
|||
|
|
5. **Draft Development**: Plan structure before execution
|
|||
|
|
|
|||
|
|
### ❌ Not Recommended For
|
|||
|
|
|
|||
|
|
1. **Final Shooting Scripts**: Requires human refinement for production
|
|||
|
|
2. **Comedy/Action Genres**: Training bias toward dramatic content
|
|||
|
|
3. **Long-form Series**: Single-pass generation may lack consistency
|
|||
|
|
4. **Immediate Production**: Dialogue needs naturalization
|
|||
|
|
|
|||
|
|
## Evaluation Results
|
|||
|
|
|
|||
|
|
### Quantitative Metrics (vs. Base Model)
|
|||
|
|
|
|||
|
|
| Aspect | Score | Base Model | Improvement |
|
|||
|
|
|--------|-------|------------|-------------|
|
|||
|
|
| **Thinking Depth** | 9/10 | 5/10 | +80% |
|
|||
|
|
| **Script Format** | 9/10 | 8/10 | +13% |
|
|||
|
|
| **Dramatic Craft** | 8.5/10 | 8/10 | +6% |
|
|||
|
|
| **Character Psychology** | 9/10 | 6/10 | +50% |
|
|||
|
|
| **Decision Transparency** | 9/10 | 5/10 | +80% |
|
|||
|
|
| **Overall** | 8.1/10 | 6.9/10 | +17% |
|
|||
|
|
|
|||
|
|
> **Note on Methodology:**
|
|||
|
|
> *These metrics are generated using an **LLM-as-a-Judge** framework (Claude) comparing the fine-tuned model against the base model.
|
|||
|
|
|
|||
|
|
### Qualitative Improvements
|
|||
|
|
|
|||
|
|
- ✅ **Professional Voice**: Sounds like experienced screenwriter
|
|||
|
|
- ✅ **Structural Thinking**: Explicit three-act planning
|
|||
|
|
- ✅ **Meta-Awareness**: "This isn't just a script. It's a reckoning."
|
|||
|
|
- ✅ **Non-Linear Reasoning**: Considers alternatives, backtracks, refines
|
|||
|
|
- ✅ **Craft-Oriented**: Explains why choices serve the story
|
|||
|
|
|
|||
|
|
## Limitations
|
|||
|
|
|
|||
|
|
1. **Thinking Verbosity**: Generates ~3,400 tokens of thinking (87% of output)
|
|||
|
|
- May be excessive for quick tasks
|
|||
|
|
- Consider using `max_new_tokens` to limit length
|
|||
|
|
|
|||
|
|
2. **Incomplete Execution**: Token budget consumed by thinking
|
|||
|
|
- Many planned scenes not fully generated
|
|||
|
|
- May need 6,000-8,000 token limit for complete scripts
|
|||
|
|
|
|||
|
|
3. **Dialogue Naturalness**: More direct/literary than conversational
|
|||
|
|
- Training data style influences output
|
|||
|
|
- May need post-processing for natural speech
|
|||
|
|
|
|||
|
|
4. **Training Data Bias**: Skews toward melodramatic scenarios
|
|||
|
|
- Less suited for subtle/realistic dialogue
|
|||
|
|
- Best for emotionally intense stories
|
|||
|
|
|
|||
|
|
## Training Insights
|
|||
|
|
|
|||
|
|
### What Made This Successful
|
|||
|
|
|
|||
|
|
1. **8192 Token Context**: Essential for capturing full thinking chains
|
|||
|
|
- Initial assumption of 2048 would have truncated data
|
|||
|
|
- Average sample length: ~5,000 tokens
|
|||
|
|
|
|||
|
|
2. **DeepSpeed Zero3**: Required (not optional)
|
|||
|
|
- Single H100: Would need ~109-114 GB (OOM)
|
|||
|
|
- Zero3 sharding: ~74.62 GB per card ✅
|
|||
|
|
|
|||
|
|
3. **Full Parameter Training**: Worth the cost
|
|||
|
|
- Deeper capability transfer than LoRA
|
|||
|
|
- Better thinking process internalization
|
|||
|
|
- Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
|
|||
|
|
|
|||
|
|
4. **Quality Training Data**: 6,319 long-form reasoning examples
|
|||
|
|
- Actual creative process in `<think>` tags
|
|||
|
|
- High-quality dramatic writing
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{qwen3-drama-thinking-2025,
|
|||
|
|
author = {FutureMa},
|
|||
|
|
title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
|
|||
|
|
year = {2025},
|
|||
|
|
publisher = {HuggingFace},
|
|||
|
|
howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
|
|||
|
|
note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## News & Updates
|
|||
|
|
|
|||
|
|
**[2025-12-23]** 🎉 **DramaBench Dataset is now open-source!** Evaluate your drama script generation with our comprehensive 6-dimensional benchmark framework (Format Standards, Narrative Efficiency, Character Consistency, Emotional Depth, Logic Consistency, Conflict Handling).
|
|||
|
|
- 📊 Dataset: [FutureMa/DramaBench](https://huggingface.co/datasets/FutureMa/DramaBench)
|
|||
|
|
- 📄 Paper: [arXiv:2512.19012](https://arxiv.org/abs/2512.19012)
|
|||
|
|
- 🌐 Demo: [dramabench.pages.dev](https://dramabench.pages.dev/)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Acknowledgments
|
|||
|
|
|
|||
|
|
- **Base Model**: [Qwen Team](https://huggingface.co/Qwen) - Qwen3-8B
|
|||
|
|
- **Training Framework**: [ms-swift](https://github.com/modelscope/ms-swift) - ModelScope SWIFT
|
|||
|
|
- **Infrastructure**: [Lambda Cloud](https://lambdalabs.com/) - 2x H100 80GB SXM5
|
|||
|
|
- **Dataset**: Custom Drama Thinking Dataset (6,319 samples)
|
|||
|
|
|
|||
|
|
## Model Card Contact
|
|||
|
|
|
|||
|
|
For questions or feedback:
|
|||
|
|
- **HuggingFace**: [@FutureMa](https://huggingface.co/FutureMa)
|
|||
|
|
- **GitHub Issues**: Report via ms-swift repository
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Training Date**: 2025-12-08
|
|||
|
|
**Training Duration**: 2h 46m
|
|||
|
|
**Model Size**: ~16GB (BF16 precision)
|
|||
|
|
**Recommended VRAM**: 16GB+ for inference
|