初始化项目，由ModelHub XC社区提供模型

Model: stockmark/gpt-neox-japanese-1.4b Source: Original Platform
2026-06-08 22:03:15 +08:00
commit b737a3bd5d
11 changed files with 100224 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
+---
+license: mit
+language:
+- ja
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- gpt_neox
+- gpt-neox
+- japanese
+inference:
+  parameters:
+    max_new_tokens: 32
+    do_sample: false
+    repetition_penalty: 1.1
+---
+
+# stockmark/gpt-neox-japanese-1.4b
+
+This repository provides a GPT-NeoX based model with 1.4B parameters pre-trained on Japanese corpus of about 20B tokens. This model is developed by [Stockmark Inc.](https://stockmark.co.jp/)
+
+## How to use
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# Use torch.bfloat16 for A100 GPU and torch.flaot16 for the older generation GPUs
+torch_dtype = torch.bfloat16 if torch.cuda.is_available() and hasattr(torch.cuda, "is_bf16_supported") and torch.cuda.is_bf16_supported() else torch.float16
+
+model = AutoModelForCausalLM.from_pretrained("stockmark/gpt-neox-japanese-1.4b", device_map="auto", torch_dtype=torch_dtype)
+tokenizer = AutoTokenizer.from_pretrained("stockmark/gpt-neox-japanese-1.4b")
+
+inputs = tokenizer("自然言語処理は", return_tensors="pt").to(model.device)
+with torch.no_grad():
+    tokens = model.generate(
+        **inputs,
+        max_new_tokens=128,
+        repetition_penalty=1.1
+    )
+    
+output = tokenizer.decode(tokens[0], skip_special_tokens=True)
+print(output)
+```
+
+## Example:
+
+- LoRA tuning: https://huggingface.co/stockmark/gpt-neox-japanese-1.4b/blob/main/notebooks/LoRA.ipynb
+
+## Training dataset
+- Japanese Web Corpus (ja): 8.6B tokens (This dataset will not be released.)
+- Wikipedia (ja): 0.88B tokens
+- CC100 (ja): 10.5B tokens
+
+## Training setting
+- Trained using HuggingFace Trainer and DeepSpeed (ZeRO-2)
+- 8 A100 GPUs (40GB) at ABCI
+- Mixed Precision (BF16)
+
+## License
+[The MIT license](https://opensource.org/licenses/MIT)
+
+## Developed by
+[Stockmark Inc.](https://stockmark.co.jp/)
+
+## Author
+[Takahiro Omi](https://huggingface.co/omitakahiro)