初始化项目，由ModelHub XC社区提供模型

Model: npc-worldwide/TinyTimV1 Source: Original Platform
2026-04-12 03:34:59 +08:00
commit 997078cc57
56 changed files with 109353 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
+# TinyTimV1: Fine-tuning TinyLlama on Finnegan's Wake
+
+A project exploring the fine-tuning of TinyLlama-1.1B on James Joyce's *Finnegan's Wake* to generate Joyce-inspired text.
+
+## Overview
+
+This project fine-tunes the TinyLlama-1.1B-Chat model on the complete text of James Joyce's *Finnegan's Wake*, creating a language model capable of generating text in Joyce's distinctive experimental style. The model learns to replicate the complex wordplay, neologisms, and stream-of-consciousness narrative techniques characteristic of Joyce's final work.
+
+## Files
+
+- `process_wake.py` - Preprocesses the raw text, removes page numbers, and splits into manageable chunks
+- `fine_tune_joyce.py` - Main training script using HuggingFace Transformers
+- `text_gen.py` - Text generation script for the fine-tuned model
+- `finn_wake.txt` - Complete text of Finnegan's Wake (1.51 MB)
+- `finn_wake.csv` - Processed dataset in CSV format
+- `finn_wake_dataset/` - Tokenized dataset directory
+
+## Usage
+
+### 1. Data Preprocessing
+```bash
+python process_wake.py
+```
+This removes page numbers and splits the text into 100-word chunks for training.
+2. Fine-tuning
+
+```bash
+python fine_tune_joyce.py
+```
+Fine-tunes TinyLlama on the processed dataset for 3 epochs with CPU training.
+3. Text Generation
+```bash
+python text_gen.py
+```
+Generates Joyce-inspired text using the fine-tuned model.
+
+Model Details
+
+Base Model: TinyLlama-1.1B-Chat-v1.0
+Training Data: Finnegan's Wake (~1.5MB text)
+Training Parameters:
+
+3 epochs
+Batch size: 1
+Max sequence length: 128 tokens
+Temperature: 0.7
+Top-k: 50, Top-p: 0.95
+
+
+
+Example Output
+Input: "ae left to go to ireland and found a fairy"
+The model generates text continuing in Joyce's experimental style with invented words, Irish references, and complex linguistic play.
+Requirements
+transformers
+datasets
+pandas
+torch
+Installation
+bashpip install transformers datasets pandas torch
+Notes
+
+Training was performed on CPU due to resource constraints
+Model checkpoints saved every 500 steps
+Resume training supported from checkpoints
+
+