初始化项目，由ModelHub XC社区提供模型

Model: aisquared/dlite-v2-1_5b Source: Original Platform
2026-06-08 09:29:20 +08:00
commit a1b6f96dfe
12 changed files with 150737 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,34 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,131 @@
 ---
 license: apache-2.0
 datasets:
 - aisquared/databricks-dolly-15k
 language:
 - en
 library_name: transformers
 ---
 # Model Card for `dlite-v2-1.5b`
 <!-- Provide a quick summary of what the model is/does. -->
 AI Squared's `dlite-v2-1.5b` is a large language 
 model which is derived from OpenAI's large [GPT-2](https://huggingface.co/gpt2-large) model and fine-tuned on a corpus of 15k records
 ([Databricks' "Dolly 15k" Dataset](https://huggingface.co/datasets/aisquared/databricks-dolly-15k)) to help it exhibit chat-based capabilities.
 Just like [Databricks' Dolly V2 models](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm),
 `dlite-v2-1.5b` (and all other members of the `dlite-v2` family) is licensed for both **research and commercial use.** We are extremely grateful 
 for the work that Databricks has done to create the `databricks-dolly-15k` dataset, for without it we would not be able to create and release this
 model under such an open and permissive license.
 While `dlite-v2-1.5b` is **not a state-of-the-art model**, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply
 is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. 
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** AI Squared, Inc.
 - **Shared by:** AI Squared, Inc.
 - **Model type:** Large Language Model
 - **Language(s) (NLP):** EN
 - **License:** Apache v2.0
 - **Finetuned from model:** GPT-2
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 **`dlite-v2-1.5b` is not a state-of-the-art language model.** `dlite-v2-1.5b` is an experimental technology, and as with any experimental technology, 
 AI Squared urges potential users of this technology to test its capabilities thoroughly before usage.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include,
 but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations.
 Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.
 ## Usage
 To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
 From your terminal, run:
 ```python
 pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
 ```
 The instruction following pipeline can be loaded using the `pipeline` function as shown below.  This loads a custom `InstructionTextGenerationPipeline` 
 found in the model repo [here](https://huggingface.co/aisquared/dlite-v2-1_5b/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
 Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage.  It does not appear to impact output quality.
 It is also fine to remove it if there is sufficient memory.
 ```python
 from transformers import pipeline
 import torch
 generate_text = pipeline(model="aisquared/dlite-v2-1_5b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
 ```
 You can then use the pipeline to answer instructions:
 ```python
 res = generate_text("Who was George Washington?")
 print(res)
 ```
 Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/aisquared/dlite-v2-1_5b/blob/main/instruct_pipeline.py),
 store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
 ```python
 from instruct_pipeline import InstructionTextGenerationPipeline
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 tokenizer = AutoTokenizer.from_pretrained("aisquared/dlite-v2-1_5b", padding_side="left")
 model = AutoModelForCausalLM.from_pretrained("aisquared/dlite-v2-1_5b", device_map="auto", torch_dtype=torch.bfloat16)
 generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
 ```
 ### Model Performance Metrics
 We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family.
 Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
 state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
 | Model         |   arc_challenge |   arc_easy |    boolq |   hellaswag |   openbookqa |     piqa |   winogrande |
 |:--------------|----------------:|-----------:|---------:|------------:|-------------:|---------:|-------------:|
 | dlite-v2-124m |        0.199659 |   0.447811 | 0.494801 |    0.291675 |        0.156 | 0.620239 |     0.487766 |
 | gpt2          |        0.190273 |   0.438131 | 0.487156 |    0.289185 |        0.164 | 0.628945 |     0.51618  |
 | dlite-v1-124m |        0.223549 |   0.462542 | 0.502446 |    0.293268 |        0.17  | 0.622416 |     0.494081 |
 | gpt2-medium   |        0.215017 |   0.490741 | 0.585933 |    0.333101 |        0.186 | 0.676279 |     0.531176 |
 | dlite-v2-355m |        0.251706 |   0.486111 | 0.547401 |    0.344354 |        0.216 | 0.671926 |     0.52723  |
 | dlite-v1-355m |        0.234642 |   0.507576 | 0.600306 |    0.338478 |        0.216 | 0.664309 |     0.496448 |
 | gpt2-large    |        0.216724 |   0.531566 | 0.604893 |    0.363971 |        0.194 | 0.703482 |     0.553275 |
 | dlite-v1-774m |        0.250853 |   0.545875 | 0.614985 |    0.375124 |        0.218 | 0.698041 |     0.562747 |
 | dlite-v2-774m |        0.269625 |   0.52904  | 0.613761 |    0.395937 |        0.256 | 0.691513 |     0.566693 |
 | gpt2-xl       |        0.25     |   0.582912 | 0.617737 |    0.400418 |        0.224 | 0.708379 |     0.583268 |
 | dlite-v1-1_5b |        0.268771 |   0.588384 | 0.624159 |    0.401414 |        0.226 | 0.708379 |     0.584846 |
 | dlite-v2-1_5b |        0.289249 |   0.565657 | 0.601223 |    0.434077 |        0.272 | 0.703482 |     0.588003 |
 ### Limitations
 *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
 inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
 to exercise good judgment when applying this technology.*
 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_aisquared__dlite-v2-1_5b)
 | Metric                | Value                     |
 |-----------------------|---------------------------|
 | Avg.                  | 30.03   |
 | ARC (25-shot)         | 32.59          |
 | HellaSwag (10-shot)   | 53.98    |
 | MMLU (5-shot)         | 24.93         |
 | TruthfulQA (0-shot)   | 38.77   |
 | Winogrande (5-shot)   | 54.7   |
 | GSM8K (5-shot)        | 0.23        |
 | DROP (3-shot)         | 5.04         |
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,5 @@
 {
  "### End": 50257,
  "### Instruction:": 50258,
  "### Response:\n": 50259
 }
--- a/config.json
+++ b/config.json
@@ -0,0 +1,47 @@
 {
  "_name_or_path": "gpt2-xl",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "custom_pipelines": {
    "text-generation": {
      "impl": "instruct_pipeline.InstructionTextGenerationPipeline",
      "pt": "AutoModelForCausalLM",
      "tf": "TFAutoModelForCausalLM"
    }
  },
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1600,
  "n_head": 25,
  "n_inner": null,
  "n_layer": 48,
  "n_positions": 1024,
  "output_past": true,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float16",
  "transformers_version": "4.25.1",
  "use_cache": false,
  "vocab_size": 50260
 }
--- a/instruct_pipeline.py
+++ b/instruct_pipeline.py
@@ -0,0 +1,160 @@
 import re
 import numpy as np
 from transformers import Pipeline, PreTrainedTokenizer
 INSTRUCTION_KEY = "### Instruction:"
 RESPONSE_KEY = "### Response:"
 END_KEY = "### End"
 INTRO_BLURB = (
    "Below is an instruction that describes a task, along with any additional context. Write a response that appropriately completes the request."
 )
 # This is the prompt that is used for generating responses using an already trained model.  It ends with the response
 # key, where the job of the model is to provide the completion that follows it (i.e. the response itself).
 PROMPT_FOR_GENERATION_FORMAT = """{intro}
 {instruction_key}
 {instruction}
 {response_key}
 """.format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
 )
 def get_special_token_id(tokenizer: PreTrainedTokenizer, key: str) -> int:
    """Gets the token ID for a given string that has been added to the tokenizer as a special token.
    When training, we configure the tokenizer so that the sequences like "### Instruction:" and "### End" are
    treated specially and converted to a single, new token.  This retrieves the token ID each of these keys map to.
    Args:
        tokenizer (PreTrainedTokenizer): the tokenizer
        key (str): the key to convert to a single token
    Raises:
        RuntimeError: if more than one ID was generated
    Returns:
        int: the token ID for the given key
    """
    token_ids = tokenizer.encode(key)
    if len(token_ids) > 1:
        raise ValueError(f"Expected only a single token for '{key}' but found {token_ids}")
    return token_ids[0]
 class InstructionTextGenerationPipeline(Pipeline):
    def __init__(
        self, *args, do_sample: bool = True, max_new_tokens: int = 256, top_p: float = 0.92, top_k: int = 0, **kwargs
    ):
        super().__init__(*args, do_sample=do_sample, max_new_tokens=max_new_tokens, top_p=top_p, top_k=top_k, **kwargs)
    def _sanitize_parameters(self, return_instruction_text=False, **generate_kwargs):
        preprocess_params = {}
        # newer versions of the tokenizer configure the response key as a special token.  newer versions still may
        # append a newline to yield a single token.  find whatever token is configured for the response key.
        tokenizer_response_key = next(
            (token for token in self.tokenizer.additional_special_tokens if token.startswith(RESPONSE_KEY)), None
        )
        response_key_token_id = None
        end_key_token_id = None
        if tokenizer_response_key:
            try:
                response_key_token_id = get_special_token_id(self.tokenizer, tokenizer_response_key)
                end_key_token_id = get_special_token_id(self.tokenizer, END_KEY)
                # Ensure generation stops once it generates "### End"
                generate_kwargs["eos_token_id"] = end_key_token_id
            except ValueError:
                pass
        forward_params = generate_kwargs
        postprocess_params = {
            "response_key_token_id": response_key_token_id,
            "end_key_token_id": end_key_token_id,
            "return_instruction_text": return_instruction_text,
        }
        return preprocess_params, forward_params, postprocess_params
    def preprocess(self, instruction_text, **generate_kwargs):
        prompt_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction=instruction_text)
        inputs = self.tokenizer(
            prompt_text,
            return_tensors="pt",
        )
        inputs["prompt_text"] = prompt_text
        inputs["instruction_text"] = instruction_text
        return inputs
    def _forward(self, model_inputs, **generate_kwargs):
        input_ids = model_inputs["input_ids"]
        attention_mask = model_inputs.get("attention_mask", None)
        generated_sequence = self.model.generate(
            input_ids=input_ids.to(self.model.device),
            attention_mask=attention_mask,
            pad_token_id=self.tokenizer.pad_token_id,
            **generate_kwargs,
        )[0].cpu()
        instruction_text = model_inputs.pop("instruction_text")
        return {"generated_sequence": generated_sequence, "input_ids": input_ids, "instruction_text": instruction_text}
    def postprocess(self, model_outputs, response_key_token_id, end_key_token_id, return_instruction_text):
        sequence = model_outputs["generated_sequence"]
        instruction_text = model_outputs["instruction_text"]
        # The response will be set to this variable if we can identify it.
        decoded = None
        # If we have token IDs for the response and end, then we can find the tokens and only decode between them.
        if response_key_token_id and end_key_token_id:
            # Find where "### Response:" is first found in the generated tokens.  Considering this is part of the
            # prompt, we should definitely find it.  We will return the tokens found after this token.
            response_pos = None
            response_positions = np.where(sequence == response_key_token_id)[0]
            if len(response_positions) == 0:
                pass
            else:
                response_pos = response_positions[0]
            if response_pos:
                # Next find where "### End" is located.  The model has been trained to end its responses with this
                # sequence (or actually, the token ID it maps to, since it is a special token).  We may not find
                # this token, as the response could be truncated.  If we don't find it then just return everything
                # to the end.  Note that even though we set eos_token_id, we still see the this token at the end.
                end_pos = None
                end_positions = np.where(sequence == end_key_token_id)[0]
                if len(end_positions) > 0:
                    end_pos = end_positions[0]
                decoded = self.tokenizer.decode(sequence[response_pos + 1 : end_pos]).strip()
        else:
            # Otherwise we'll decode everything and use a regex to find the response and end.
            fully_decoded = self.tokenizer.decode(sequence)
            # The response appears after "### Response:".  The model has been trained to append "### End" at the
            # end.
            m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", fully_decoded, flags=re.DOTALL)
            if m:
                decoded = m.group(1).strip()
            else:
                # The model might not generate the "### End" sequence before reaching the max tokens.  In this case,
                # return everything after "### Response:".
                m = re.search(r"#+\s*Response:\s*(.+)", fully_decoded, flags=re.DOTALL)
                if m:
                    decoded = m.group(1).strip()
        if return_instruction_text:
            return {"instruction_text": instruction_text, "generated_text": decoded}
        return decoded
--- a/merges.txt
+++ b/merges.txt
--- a/pytorch_model.bin
+++ b/pytorch_model.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:5a2ba12acb87aff387e4b0c41a399acde9d210813c2a843bcc92a36ccbb053fd
 size 3165775133
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,11 @@
 {
  "additional_special_tokens": [
    "### End",
    "### Instruction:",
    "### Response:\n"
  ],
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "pad_token": "<|endoftext|>",
  "unk_token": "<|endoftext|>"
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,10 @@
 {
  "add_prefix_space": false,
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "model_max_length": 1024,
  "name_or_path": "gpt2-xl",
  "special_tokens_map_file": null,
  "tokenizer_class": "GPT2Tokenizer",
  "unk_token": "<|endoftext|>"
 }
--- a/training_args.bin
+++ b/training_args.bin
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7419979b2c153a41b2f0f96d5d6a8e1a6c584c5eef9ab0e20084ec471e6c2681
 size 4539
--- a/vocab.json
+++ b/vocab.json