60 lines
1.6 KiB
Markdown
60 lines
1.6 KiB
Markdown
---
|
|
license: cc-by-sa-4.0
|
|
language:
|
|
- cs
|
|
pipeline_tag: text-generation
|
|
widget:
|
|
- text: '# ABBA # 1900'
|
|
example_title: ABBA Rhyme Schema
|
|
- text: '# ABAB # 1920'
|
|
example_title: ABAB Rhyme Schema
|
|
- text: '# AABB # 1900'
|
|
example_title: AABB Rhyme Schema
|
|
- text: '# AABCCB # 1880'
|
|
example_title: AABCCB Rhyme Schema
|
|
base_model:
|
|
- lchaloupsky/czech-gpt2-oscar
|
|
---
|
|
|
|
### Czech Poetry GPT
|
|
GPT2 finetuned on Czech poetry from github project by
|
|
Institute of Czech Literature, Czech Academy of Sciences.
|
|
|
|
https://github.com/versotym/corpusCzechVerse
|
|
|
|
Done using GPT-Czech-Poet
|
|
|
|
https://github.com/jinymusim/GPT-Czech-Poet
|
|
|
|
https://arxiv.org/abs/2407.12790
|
|
|
|
## Usage
|
|
|
|
Use as any other GPT2 style model
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
import torch
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("jinymusim/gpt-czech-poet")
|
|
model = AutoModelForCausalLM.from_pretrained("jinymusim/gpt-czech-poet")
|
|
|
|
# Input Poet Start
|
|
poet_start = "# AABB # 1900\nD"
|
|
poet_start = poet_start.strip()
|
|
tokenized_poet_start = tokenizer.encode(poet_start, return_tensors='pt')
|
|
|
|
# generated a continuation to it
|
|
out = model.generate(tokenized_poet_start,
|
|
max_length=256,
|
|
do_sample=True,
|
|
top_k=50
|
|
early_stopping=True,
|
|
pad_token_id= tokenizer.pad_token_id,
|
|
eos_token_id = tokenizer.eos_token_id)
|
|
|
|
# Decode Poet
|
|
decoded_cont = tokenizer.decode(out[0], skip_special_tokens=True)
|
|
|
|
print(decoded_cont)
|
|
``` |