初始化项目，由ModelHub XC社区提供模型

Model: harpertoken/harpertokenConvFT Source: Original Platform
2026-05-07 21:48:08 +08:00
commit ee7a2e9189
10 changed files with 150502 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,106 @@
 ---
 license: mit
 language:
 - en
 base_model:
 - gpt2
 tags:
 - text-generation-inference
 - conversational-ai
 - gpt2
 metrics:
 - perplexity
 - bleu
 - f1
 library_name: transformers
 ---
 # HarpertokenConvFT
 ## Model Details
 - **Model Name**: HarpertokenConvFT
 - **Base Model**: gpt2
 - **Model Type**: GPT-2-based conversational AI model
 - **Max Sequence Length**: 1024 tokens
 ## Intended Use
 Generates human-like responses for chatbots, virtual assistants, and dialogue systems.
 ## Training Data
 The model was fine-tuned on the DailyDialog dataset, featuring:
 - **Training Examples**: 11,118
 - **Validation Examples**: 1,000
 - **Test Examples**: 1,000
 ## Dataset Characteristics
 - **Description**: A high-quality, multi-turn dialogue dataset covering everyday topics.
 - **Features**: Includes dialogues, communication acts, and emotion annotations.
 - **Citation**:
  ```
  @InProceedings{li2017dailydialog,
      author = {Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi},
      title = {DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset},
      booktitle = {Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
      year = {2017}
  }
  ```
 ## Training Configuration
 - **Learning Rate**: 2e-5
 - **Batch Size**: 8
 - **Number of Epochs**: 3
 - **Weight Decay**: 0.01
 ## Ethical Considerations
 Inherited from the GPT-2 base model and the DailyDialog dataset, this model may reflect biases or limitations present in its training data. Caution is advised when using it in sensitive contexts, as it could produce biased or inappropriate responses.
 ## How to Use
 ### Using the Model Directly
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load model and tokenizer
 model = AutoModelForCausalLM.from_pretrained("harpertoken/harpertokenConvFT")
 tokenizer = AutoTokenizer.from_pretrained("harpertoken/harpertokenConvFT")
 # Prepare input
 input_text = "Hello, how are you?"
 inputs = tokenizer(input_text, return_tensors="pt")
 # Generate response
 outputs = model.generate(**inputs)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 ```
 ### Using the Terminal
 Run the provided script to generate responses:
 ```bash
 python3 generate_response.py --input "Hello, how are you?"
 ```
 ### Using the API
 **Check API Status:**
 ```bash
 curl http://localhost:8000/status
 ```
 **Generate a Response:**
 ```bash
 curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"input_text": "Hello, how are you?"}'
 ```
 ### Using FastAPI Documentation
 Interact with the API via the browser at:
 [http://localhost:8000/docs#/default/generate_response_chat_post](http://localhost:8000/docs#/default/generate_response_chat_post)
 ## Related Models
 - **harpertokenConvAI**: [https://huggingface.co/harpertoken/harpertokenConvAI](https://huggingface.co/harpertoken/harpertokenConvAI) - DistilBERT-based model for question answering. Note: This is not the base model for harpertokenConvFT due to incompatible architectures (DistilBERT vs GPT-2).
 - **Base Model**: This model is fine-tuned from GPT-2 ([openai/gpt2](https://huggingface.co/gpt2)).
 ## Model Differences
 harpertokenConvFT is a GPT-2 model for conversational AI, while harpertokenConvAI is a DistilBERT model for question answering. They have different architectures, tokenizers, and parameters, making fine-tuning between them impossible. For truthful info, refer to the config.json files.
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,3 @@
 {
  "[PAD]": 50257
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,39 @@
 {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers_version": "4.28.1",
  "use_cache": true,
  "vocab_size": 50258
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 50256,
  "eos_token_id": 50256,
  "transformers_version": "4.28.1"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:45611cc0dea39b30a63c2052b120c79577e876d77199a660d93ee79928317bc3
 size 510362736
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,6 @@
 {
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "pad_token": "[PAD]",
  "unk_token": "<|endoftext|>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,9 @@
 {
  "add_prefix_space": false,
  "bos_token": "<|endoftext|>",
  "clean_up_tokenization_spaces": true,
  "eos_token": "<|endoftext|>",
  "model_max_length": 1024,
  "tokenizer_class": "GPT2Tokenizer",
  "unk_token": "<|endoftext|>"
 }
--- a/vocab.json
+++ b/vocab.json