105 lines
3.4 KiB
Markdown
105 lines
3.4 KiB
Markdown
---
|
|
library_name: transformers
|
|
base_model:
|
|
- canopylabs/orpheus-3b-0.1-ft
|
|
---
|
|
|
|
# Malaysian canopylabs/orpheus-3b-0.1-ft
|
|
|
|
Finetune [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) on standard Malay and minimal Mandarin.
|
|
|
|
## Training session
|
|
|
|
Finetune on [Mesolitica/TTS](https://huggingface.co/datasets/mesolitica/TTS) to make the model able to generate Malay voice with minimal Mandarin.
|
|
|
|
## How we train
|
|
|
|
1. LoRA on `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]`.
|
|
2. 128 Rank with alpha 256, or alpha of 2.0, but during merging, we use 1.5 ratio.
|
|
3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
|
|
4. Chunk CE loss to reduce memory.
|
|
|
|
Wandb at https://wandb.ai/huseinzol05/malay-orpheus-3b-0.1-ft-lora-128/workspace?nw=nwuserhuseinzol05
|
|
|
|
## Example
|
|
|
|
Load the model,
|
|
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
|
|
from snac import SNAC
|
|
import torch
|
|
import IPython.display as ipd
|
|
|
|
def redistribute_codes(row):
|
|
row_length = row.size(0)
|
|
new_length = (row_length // 7) * 7
|
|
trimmed_row = row[:new_length]
|
|
code_list = [t - 128266 for t in trimmed_row]
|
|
layer_1 = []
|
|
layer_2 = []
|
|
layer_3 = []
|
|
for i in range((len(code_list)+1)//7):
|
|
layer_1.append(code_list[7*i][None])
|
|
layer_2.append(code_list[7*i+1][None]-4096)
|
|
layer_3.append(code_list[7*i+2][None]-(2*4096))
|
|
layer_3.append(code_list[7*i+3][None]-(3*4096))
|
|
layer_2.append(code_list[7*i+4][None]-(4*4096))
|
|
layer_3.append(code_list[7*i+5][None]-(5*4096))
|
|
layer_3.append(code_list[7*i+6][None]-(6*4096))
|
|
|
|
with torch.no_grad():
|
|
|
|
codes = [torch.concat(layer_1),
|
|
torch.concat(layer_2),
|
|
torch.concat(layer_3)]
|
|
|
|
for i in range(len(codes)):
|
|
codes[i][codes[i] < 0] = 0
|
|
codes[i] = codes[i][None]
|
|
|
|
audio_hat = snac_model.decode(codes)
|
|
return audio_hat.cpu()[0, 0]
|
|
|
|
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
|
|
snac_model = snac_model.to("cuda")
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained('mesolitica/Malaysian-orpheus-3b-0.1-ft')
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
'mesolitica/Malaysian-orpheus-3b-0.1-ft', torch_dtype = torch.bfloat16
|
|
).cuda()
|
|
|
|
speakers = [
|
|
'Husein',
|
|
'Shafiqah Idayu',
|
|
'Anwar Ibrahim',
|
|
'KP'
|
|
]
|
|
|
|
speaker = speakers[0]
|
|
text = 'Nama saya Husein, saya tak suka nasi ayam dan tak suka mandi, Xiàn zài wǒ yǒu bing chilling Wǒ hěn xǐ huān bing chilling.'
|
|
prompt = f'<custom_token_3><|begin_of_text|>{speaker}: {text}<|eot_id|><custom_token_4><custom_token_5><custom_token_1>'
|
|
input_ids = tokenizer(prompt,add_special_tokens = False, return_tensors = 'pt').to('cuda')
|
|
|
|
with torch.no_grad():
|
|
generated_ids = model.generate(
|
|
**input_ids,
|
|
max_new_tokens=1200,
|
|
do_sample=True,
|
|
temperature=0.9,
|
|
top_p=0.8,
|
|
repetition_penalty=1.1,
|
|
num_return_sequences=1,
|
|
eos_token_id=128258,
|
|
)
|
|
|
|
row = generated_ids[0, input_ids['input_ids'].shape[1]:]
|
|
y_ = redistribute_codes(row)
|
|
ipd.Audio(y_, rate = 24000)
|
|
```
|
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5e73316106936008a9ee6523/NIOtl7B6Myw1eBd5Lf76l.wav"></audio>
|
|
|
|
## Source code
|
|
|
|
Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/orpheus |