初始化项目,由ModelHub XC社区提供模型
Model: stockmark/gpt-neox-japanese-1.4b Source: Original Platform
This commit is contained in:
67
README.md
Normal file
67
README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
license: mit
|
||||
language:
|
||||
- ja
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- gpt_neox
|
||||
- gpt-neox
|
||||
- japanese
|
||||
inference:
|
||||
parameters:
|
||||
max_new_tokens: 32
|
||||
do_sample: false
|
||||
repetition_penalty: 1.1
|
||||
---
|
||||
|
||||
# stockmark/gpt-neox-japanese-1.4b
|
||||
|
||||
This repository provides a GPT-NeoX based model with 1.4B parameters pre-trained on Japanese corpus of about 20B tokens. This model is developed by [Stockmark Inc.](https://stockmark.co.jp/)
|
||||
|
||||
## How to use
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
# Use torch.bfloat16 for A100 GPU and torch.flaot16 for the older generation GPUs
|
||||
torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "is_bf16_supported") and torch.cuda.is_bf16_supported() else torch.float16
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained("stockmark/gpt-neox-japanese-1.4b", device_map="auto", torch_dtype=torch_dtype)
|
||||
tokenizer = AutoTokenizer.from_pretrained("stockmark/gpt-neox-japanese-1.4b")
|
||||
|
||||
inputs = tokenizer("自然言語処理は", return_tensors="pt").to(model.device)
|
||||
with torch.no_grad():
|
||||
tokens = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=128,
|
||||
repetition_penalty=1.1
|
||||
)
|
||||
|
||||
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
|
||||
print(output)
|
||||
```
|
||||
|
||||
## Example:
|
||||
|
||||
- LoRA tuning: https://huggingface.co/stockmark/gpt-neox-japanese-1.4b/blob/main/notebooks/LoRA.ipynb
|
||||
|
||||
## Training dataset
|
||||
- Japanese Web Corpus (ja): 8.6B tokens (This dataset will not be released.)
|
||||
- Wikipedia (ja): 0.88B tokens
|
||||
- CC100 (ja): 10.5B tokens
|
||||
|
||||
## Training setting
|
||||
- Trained using HuggingFace Trainer and DeepSpeed (ZeRO-2)
|
||||
- 8 A100 GPUs (40GB) at ABCI
|
||||
- Mixed Precision (BF16)
|
||||
|
||||
## License
|
||||
[The MIT license](https://opensource.org/licenses/MIT)
|
||||
|
||||
## Developed by
|
||||
[Stockmark Inc.](https://stockmark.co.jp/)
|
||||
|
||||
## Author
|
||||
[Takahiro Omi](https://huggingface.co/omitakahiro)
|
||||
Reference in New Issue
Block a user