初始化项目,由ModelHub XC社区提供模型
Model: facebook/galactica-6.7b Source: Original Platform
This commit is contained in:
47
.gitattributes
vendored
Normal file
47
.gitattributes
vendored
Normal file
@@ -0,0 +1,47 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
||||
*.tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
*.db* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ark* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
|
||||
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.gguf* filter=lfs diff=lfs merge=lfs -text
|
||||
*.ggml filter=lfs diff=lfs merge=lfs -text
|
||||
*.llamafile* filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
177
README.md
Normal file
177
README.md
Normal file
@@ -0,0 +1,177 @@
|
||||
---
|
||||
license: cc-by-nc-4.0
|
||||
tags:
|
||||
- galactica
|
||||
|
||||
widget:
|
||||
- text: "The Transformer architecture [START_REF]"
|
||||
- text: "The Schwarzschild radius is defined as: \\["
|
||||
- text: "A force of 0.6N is applied to an object, which accelerates at 3m/s. What is its mass? <work>"
|
||||
- text: "Lecture 1: The Ising Model\n\n"
|
||||
- text: "[START_I_SMILES]"
|
||||
- text: "[START_AMINO]GHMQSITAGQKVISKHKNGRFYQCEVVRLTTETFYEVNFDDGSFSDNLYPEDIVSQDCLQFGPPAEGEVVQVRWTDGQVYGAKFVASHPIQMYQVEFEDGSQLVVKRDDVYTLDEELP[END_AMINO] ## Keywords"
|
||||
inference: false
|
||||
---
|
||||
|
||||

|
||||
|
||||
|
||||
# GALACTICA 6.7B (standard)
|
||||
|
||||
Model card from the original [repo](https://github.com/paperswithcode/galai/blob/main/docs/model_card.md)
|
||||
|
||||
Following [Mitchell et al. (2018)](https://arxiv.org/abs/1810.03993), this model card provides information about the GALACTICA model, how it was trained, and the intended use cases. Full details about how the model was trained and evaluated can be found in the [release paper](https://galactica.org/paper.pdf).
|
||||
|
||||
## Model Details
|
||||
|
||||
The GALACTICA models are trained on a large-scale scientific corpus. The models are designed to perform scientific tasks, including but not limited to citation prediction, scientific QA, mathematical reasoning, summarization, document generation, molecular property prediction and entity extraction. The models were developed by the Papers with Code team at Meta AI to study the use of language models for the automatic organization of science. We train models with sizes ranging from 125M to 120B parameters. Below is a summary of the released models:
|
||||
|
||||
| Size | Parameters |
|
||||
|:-----------:|:-----------:|
|
||||
| `mini` | 125 M |
|
||||
| `base` | 1.3 B |
|
||||
| `standard` | 6.7 B |
|
||||
| `large` | 30 B |
|
||||
| `huge` | 120 B |
|
||||
|
||||
|
||||
## Release Date
|
||||
|
||||
November 2022
|
||||
|
||||
## Model Type
|
||||
|
||||
Transformer based architecture in a decoder-only setup with a few modifications (see paper for more details).
|
||||
|
||||
## Paper & Demo
|
||||
|
||||
[Paper](https://galactica.org/paper.pdf) / [Demo](https://galactica.org)
|
||||
|
||||
## Model Use
|
||||
|
||||
The primary intended users of the GALACTICA models are researchers studying language models applied to the scientific domain. We also anticipate the model will be useful for developers who wish to build scientific tooling. However, we caution against production use without safeguards given the potential of language models to hallucinate.
|
||||
|
||||
The models are made available under a non-commercial CC BY-NC 4.0 license. More information about how to use the model can be found in the README.md of this repository.
|
||||
|
||||
## Training Data
|
||||
|
||||
The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural langauge interface for different tasks. See the README.md for more information. See the paper for full information on the training data.
|
||||
|
||||
## How to use
|
||||
|
||||
Find below some example scripts on how to use the model in `transformers`:
|
||||
|
||||
## Using the Pytorch model
|
||||
|
||||
### Running the model on a CPU
|
||||
|
||||
<details>
|
||||
<summary> Click to expand </summary>
|
||||
|
||||
```python
|
||||
|
||||
from transformers import AutoTokenizer, OPTForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
|
||||
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b")
|
||||
|
||||
input_text = "The Transformer architecture [START_REF]"
|
||||
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
|
||||
|
||||
outputs = model.generate(input_ids)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Running the model on a GPU
|
||||
|
||||
<details>
|
||||
<summary> Click to expand </summary>
|
||||
|
||||
```python
|
||||
# pip install accelerate
|
||||
from transformers import AutoTokenizer, OPTForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
|
||||
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto")
|
||||
|
||||
input_text = "The Transformer architecture [START_REF]"
|
||||
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
|
||||
|
||||
outputs = model.generate(input_ids)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
### Running the model on a GPU using different precisions
|
||||
|
||||
#### FP16
|
||||
|
||||
<details>
|
||||
<summary> Click to expand </summary>
|
||||
|
||||
```python
|
||||
# pip install accelerate
|
||||
import torch
|
||||
from transformers import AutoTokenizer, OPTForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
|
||||
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)
|
||||
|
||||
input_text = "The Transformer architecture [START_REF]"
|
||||
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
|
||||
|
||||
outputs = model.generate(input_ids)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
#### INT8
|
||||
|
||||
<details>
|
||||
<summary> Click to expand </summary>
|
||||
|
||||
```python
|
||||
# pip install bitsandbytes accelerate
|
||||
from transformers import AutoTokenizer, OPTForCausalLM
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
|
||||
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", load_in_8bit=True)
|
||||
|
||||
input_text = "The Transformer architecture [START_REF]"
|
||||
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
|
||||
|
||||
outputs = model.generate(input_ids)
|
||||
print(tokenizer.decode(outputs[0]))
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
## Performance and Limitations
|
||||
|
||||
The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge-intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models. That being said, we note a number of limitations in this section.
|
||||
|
||||
As with other language models, GALACTICA is often prone to hallucination - and training on a high-quality academic corpus does not prevent this, especially for less popular and less cited scientific concepts. There are no guarantees of truthful output when generating from the model. This extends to specific modalities such as citation prediction. While GALACTICA's citation behaviour approaches the ground truth citation behaviour with scale, the model continues to exhibit a popularity bias at larger scales.
|
||||
|
||||
In addition, we evaluated the model on several types of benchmarks related to stereotypes and toxicity. Overall, the model exhibits substantially lower toxicity rates compared to other large language models. That being said, the model continues to exhibit bias on certain measures (see the paper for details). So we recommend care when using the model for generations.
|
||||
|
||||
## Broader Implications
|
||||
|
||||
GALACTICA can potentially be used as a new way to discover academic literature. We also expect a lot of downstream use for application to particular domains, such as mathematics, biology, and chemistry. In the paper, we demonstrated several examples of the model acting as alternative to standard search tools. We expect a new generation of scientific tools to be built upon large language models such as GALACTICA.
|
||||
|
||||
We encourage researchers to investigate beneficial and new use cases for these models. That being said, it is important to be aware of the current limitations of large language models. Researchers should pay attention to common issues such as hallucination and biases that could emerge from using these models.
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@inproceedings{GALACTICA,
|
||||
title={GALACTICA: A Large Language Model for Science},
|
||||
author={Ross Taylor and Marcin Kardas and Guillem Cucurull and Thomas Scialom and Anthony Hartshorn and Elvis Saravia and Andrew Poulton and Viktor Kerkez and Robert Stojnic},
|
||||
year={2022}
|
||||
}
|
||||
```
|
||||
32
config.json
Normal file
32
config.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"_name_or_path": "/content/standard",
|
||||
"_remove_final_layer_norm": false,
|
||||
"activation_dropout": 0.0,
|
||||
"activation_function": "gelu",
|
||||
"architectures": [
|
||||
"OPTForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.1,
|
||||
"enable_bias": true,
|
||||
"bos_token_id": 0,
|
||||
"do_layer_norm_before": true,
|
||||
"dropout": 0.1,
|
||||
"eos_token_id": 2,
|
||||
"ffn_dim": 16384,
|
||||
"hidden_size": 4096,
|
||||
"init_std": 0.02,
|
||||
"layer_norm_elementwise_affine": true,
|
||||
"layerdrop": 0.0,
|
||||
"learned_embeddings": true,
|
||||
"max_position_embeddings": 2048,
|
||||
"model_type": "opt",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"pad_token_id": 1,
|
||||
"scale_embeddings": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.21.0.dev0",
|
||||
"use_cache": true,
|
||||
"vocab_size": 50000,
|
||||
"word_embed_proj_dim": 4096
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 0,
|
||||
"eos_token_id": 2,
|
||||
"pad_token_id": 1,
|
||||
"transformers_version": "4.27.0.dev0"
|
||||
}
|
||||
3
pytorch_model-00001-of-00002.bin
Normal file
3
pytorch_model-00001-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d9a696389c643c059e525d9da978a070b781d487d73d6f3358ecce47857658cc
|
||||
size 9958533335
|
||||
3
pytorch_model-00002-of-00002.bin
Normal file
3
pytorch_model-00002-of-00002.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:06485fffcd88f3305a91b1a3afead1242432bb48593fcd3b6fa5a11b75b9505a
|
||||
size 3765964810
|
||||
3
pytorch_model.bin.index.json
Normal file
3
pytorch_model.bin.index.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4f677915b51a73f14a4343f5e1dac189aa6f1b1863c45bca247b6801586377cc
|
||||
size 45091
|
||||
1
special_tokens_map.json
Normal file
1
special_tokens_map.json
Normal file
@@ -0,0 +1 @@
|
||||
{}
|
||||
99996
tokenizer.json
Normal file
99996
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
5
tokenizer_config.json
Normal file
5
tokenizer_config.json
Normal file
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"name_or_path": "/content/tokenizer",
|
||||
"special_tokens_map_file": "/content/tokenizer/special_tokens_map.json",
|
||||
"tokenizer_class": "PreTrainedTokenizerFast"
|
||||
}
|
||||
Reference in New Issue
Block a user