初始化项目,由ModelHub XC社区提供模型
Model: AI-ModelScope/Biggie-SmoLlm-0.15B-Base Source: Original Platform
This commit is contained in:
41
.gitattributes
vendored
Normal file
41
.gitattributes
vendored
Normal file
@@ -0,0 +1,41 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
smolbiggie_bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Biggie_SmolLM_0.15B_Base_bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Biggie_SmolLM_0.15B_Base_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
biggie_groked_int8_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
old-biggie-smollm-checkpoint-twitter-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
3
Biggie_SmolLM_0.15B_Base_bf16.gguf
Normal file
3
Biggie_SmolLM_0.15B_Base_bf16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fa567e7fe3fb07549881f8cb50172dd3fd6a15ada0e8e9b007fd4a04ad254180
|
||||
size 362923584
|
||||
368
README.md
Normal file
368
README.md
Normal file
@@ -0,0 +1,368 @@
|
||||
---
|
||||
base_model: HuggingFaceTB/SmolLM-135M
|
||||
datasets:
|
||||
- LDJnr/Capybara
|
||||
inference:
|
||||
parameters:
|
||||
model_file: biggie_groked_int8_q8_0.gguf
|
||||
temperature: 1
|
||||
license: mit
|
||||
---
|
||||
|
||||
### TINY Frankenstein of [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) upped to 0.18b
|
||||
Use this frankenbase for training.
|
||||
Sorry for the mislabelling, the model is a 0.18b 181m parameter, not 0.15.
|
||||
I did not except this repo to blow up and now all the training scripts depend on it.
|
||||
|
||||
* ## CITE WORK FROM THIS HF PAGE AND [@cognitivecompai](https://huggingface.co/ehartford)'s OPTIMIZER ON YOUR FUTURE PAPERS OR I WILL DRAG YOUR ORG ON TWITTER LIKE I DID WITH COHERE LOL (we're cool now btw, visited them :)
|
||||
* https://github.com/cognitivecomputations/grokadamw
|
||||
* https://github.com/SakanaAI/evolutionary-model-merge/
|
||||
* https://huggingface.co/blog/smollm
|
||||
|
||||
>>[!TIP]🐧 If you're impppatient, get the trained checkpoint file that runs on 1 cpu core:
|
||||
>>
|
||||
>>wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
||||
>>
|
||||
>>make sure to install latest llama.cpp first, it's easy on linux & mac:
|
||||
>>
|
||||
>> git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
|
||||
|
||||
Now for the magic trained finetune that runs at insane speeds:
|
||||
|
||||
The settings are very finicky so be careful with your experimentation
|
||||
```verilog
|
||||
./llama-cli -fa -b 512 -ctv q8_0 -ctk q8_0 --min-p 0.3 --top-p 0.85 --keep -1 \
|
||||
-p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." \
|
||||
--in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" \
|
||||
-m biggie_groked_int8_q8_0.gguf -co -cnv \
|
||||
-c 1024 -n 700 --temp 1.5 -ngl 0 -t 1
|
||||
```
|
||||
Yup, that's no gpu, 1 cpu core.
|
||||
|
||||
This base model was built one via semi-automated continuous merging to figure out the recipe.
|
||||
Model is more coherent.
|
||||
|
||||
The temperature settings and min p etc need to be adjusted but even at default temp0 it was coherent for first 100 tokens.
|
||||
Amazing option for further training. And this is a merge of the base, not the instruct!
|
||||
|
||||
## 🧠 What's Really Going Down Here?
|
||||
|
||||
We're talking about a convergence of whole bunch of stuff, more papers will be written about this:
|
||||
|
||||
1. **Evolutionary Merging**:
|
||||
2. **BitNet Integration**:
|
||||
4. **Experimental GrokAdamW Optimizer**:
|
||||
|
||||
## Prior work, from last week
|
||||
|
||||
Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
|
||||
|
||||
## LETS TRY OUT THE EXPERIMENTAL GROKKED FINETUNE:
|
||||
|
||||
```bash
|
||||
wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
||||
```
|
||||
|
||||
Yes we will be talking with a 164mb file that runs at 160 tokens per second on a single cpu core
|
||||
## you read all of that correctly yes, 1 cpu core 160 tps https://x.com/nisten/status/1819752034305970649
|
||||

|
||||
|
||||
## 🚀 run it with NO GPU and only one CPU core it with these settings
|
||||
```bash
|
||||
./llama-cli -n -1 -fa -b 512 -ctv q8_0 -ctk q8_0 -fa --min-p 0.3 --top-p 0.85 --keep -1 -p "You are a NASA JPL Scientists. Human: I want to bring my cat to mars." -m biggie_groked_int8_q8_0.gguf -co -cnv --in-prefix "<|im_start|>Human:" --reverse-prompt "Human:" -c 1024 -n 512 --temp 1.5 -ngl 0
|
||||
```
|
||||
|
||||
|
||||
## 🏋️ Training Tutorial, MAKE YOUR OWN BIGGIE_SMOlLM
|
||||
|
||||
|
||||
Clone the repo like you're stealing code from the future:
|
||||
```bash
|
||||
git clone https://github.com/nisten/grokadamw
|
||||
cd grokadamw
|
||||
```
|
||||
|
||||
Fire up the training script and watch the magic happen:
|
||||
```bash
|
||||
python smoltrainer.py
|
||||
```
|
||||
|
||||
## 💻 Do it from scratch yourself
|
||||
Install the secret sauce (dependencies):
|
||||
```bash
|
||||
pip install torch transformers datasets tqdm
|
||||
```
|
||||
|
||||
make a file named meow.py , copy paste in this code, and then run it ```python meow.py```
|
||||
|
||||
```python
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import logging
|
||||
from datasets import load_dataset, Dataset
|
||||
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
|
||||
from torch.cuda.amp import autocast
|
||||
import warnings
|
||||
from tqdm import tqdm
|
||||
|
||||
warnings.filterwarnings("ignore", category=FutureWarning)
|
||||
warnings.filterwarnings("ignore", category=UserWarning)
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MODEL_NAME = "nisten/Biggie-SmoLlm-0.15B-Base"
|
||||
MAX_LENGTH = 2048
|
||||
BATCH_SIZE = 8
|
||||
LEARNING_RATE = 2e-4
|
||||
MAX_STEPS = 3000
|
||||
GRADIENT_ACCUMULATION_STEPS = 2
|
||||
NUM_WARMUP_STEPS = 30
|
||||
OUTPUT_DIR = "./capybara_finetuned_results"
|
||||
|
||||
torch.backends.cuda.matmul.allow_tf32 = True
|
||||
torch.backends.cudnn.allow_tf32 = True
|
||||
|
||||
class GrokAdamW(torch.optim.Optimizer):
|
||||
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=1e-2,
|
||||
alpha_init=0.98, lamb=2.0, gamma=0.1, grokking_signal_fns=None,
|
||||
grokking_signal_decay_rate=0.1, gradient_clipping=1.0):
|
||||
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,
|
||||
alpha_init=alpha_init, lamb=lamb, gamma=gamma,
|
||||
grokking_signal_fns=grokking_signal_fns,
|
||||
grokking_signal_decay_rate=grokking_signal_decay_rate,
|
||||
gradient_clipping=gradient_clipping)
|
||||
super(GrokAdamW, self).__init__(params, defaults)
|
||||
|
||||
@torch.no_grad()
|
||||
def step(self, closure=None):
|
||||
loss = None
|
||||
if closure is not None:
|
||||
with torch.enable_grad():
|
||||
loss = closure()
|
||||
|
||||
for group in self.param_groups:
|
||||
grokking_signal = self._compute_grokking_signal(group)
|
||||
for i, p in enumerate(group['params']):
|
||||
if p.grad is None:
|
||||
continue
|
||||
grad = p.grad
|
||||
|
||||
if group['gradient_clipping'] > 0:
|
||||
grad = torch.clamp(grad, -group['gradient_clipping'], group['gradient_clipping'])
|
||||
|
||||
state = self.state[p]
|
||||
|
||||
if len(state) == 0:
|
||||
state['step'] = 0
|
||||
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
||||
state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
||||
state['grok_ema'] = torch.zeros_like(p, memory_format=torch.preserve_format)
|
||||
|
||||
exp_avg, exp_avg_sq, grok_ema = state['exp_avg'], state['exp_avg_sq'], state['grok_ema']
|
||||
beta1, beta2 = group['betas']
|
||||
|
||||
state['step'] += 1
|
||||
|
||||
layer_beta1 = beta1 * (1 - group['gamma'])**i
|
||||
|
||||
alpha = group['alpha_init'] * torch.exp(torch.tensor(-group['grokking_signal_decay_rate'] * grokking_signal))
|
||||
grok_ema.mul_(alpha).add_(grad, alpha=1 - alpha)
|
||||
grok_grad = grad + group['lamb'] * grok_ema
|
||||
|
||||
exp_avg.mul_(layer_beta1).add_(grok_grad, alpha=1 - layer_beta1)
|
||||
exp_avg_sq.mul_(beta2).addcmul_(grok_grad, grok_grad, value=1 - beta2)
|
||||
|
||||
denom = exp_avg_sq.sqrt().add_(group['eps'])
|
||||
step_size = group['lr']
|
||||
|
||||
if group['weight_decay'] != 0:
|
||||
p.data.mul_(1 - group['lr'] * group['weight_decay'])
|
||||
|
||||
p.addcdiv_(exp_avg, denom, value=-step_size)
|
||||
|
||||
return loss
|
||||
|
||||
def _compute_grokking_signal(self, group):
|
||||
if group['grokking_signal_fns'] is None:
|
||||
return 0.0
|
||||
|
||||
signals = []
|
||||
for fn in group['grokking_signal_fns']:
|
||||
try:
|
||||
signal = fn()
|
||||
if signal is not None:
|
||||
signals.append(signal)
|
||||
except Exception as e:
|
||||
logger.warning(f"Error in grokking_signal_fn: {e}. Ignoring this function.")
|
||||
|
||||
if not signals:
|
||||
return 0.0
|
||||
|
||||
return sum(signals) / len(signals)
|
||||
|
||||
def format_capybara_prompts(examples):
|
||||
texts = []
|
||||
for conversation in examples['conversation']:
|
||||
formatted_text = ""
|
||||
for turn in conversation:
|
||||
if 'input' in turn:
|
||||
formatted_text += f"Human: {turn['input']}\n\n"
|
||||
if 'output' in turn:
|
||||
formatted_text += f"Assistant: {turn['output']}\n\n"
|
||||
texts.append(formatted_text.strip())
|
||||
return {"text": texts}
|
||||
|
||||
class CustomTrainer(Trainer):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self.grokking_signal = 0.0
|
||||
|
||||
def compute_loss(self, model, inputs, return_outputs=False):
|
||||
labels = inputs.pop("labels")
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.logits
|
||||
shift_logits = logits[..., :-1, :].contiguous()
|
||||
shift_labels = labels[..., 1:].contiguous()
|
||||
loss_fct = nn.CrossEntropyLoss()
|
||||
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
|
||||
return (loss, outputs) if return_outputs else loss
|
||||
|
||||
def training_step(self, model, inputs):
|
||||
model.train()
|
||||
inputs = self._prepare_inputs(inputs)
|
||||
|
||||
with autocast(dtype=torch.bfloat16):
|
||||
loss = self.compute_loss(model, inputs)
|
||||
|
||||
if self.args.gradient_accumulation_steps > 1:
|
||||
loss = loss / self.args.gradient_accumulation_steps
|
||||
|
||||
loss.backward()
|
||||
|
||||
self.grokking_signal = loss.item()
|
||||
|
||||
return loss.detach()
|
||||
|
||||
def grokking_signal_fn():
|
||||
return trainer.grokking_signal
|
||||
|
||||
def main():
|
||||
logger.info(f"🚀 Initializing {MODEL_NAME} finetuning with GrokAdamW")
|
||||
|
||||
try:
|
||||
config = AutoConfig.from_pretrained(MODEL_NAME)
|
||||
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
|
||||
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Failed to load model or tokenizer: {str(e)}")
|
||||
return
|
||||
|
||||
if tokenizer.pad_token is None:
|
||||
tokenizer.pad_token = tokenizer.eos_token
|
||||
model.config.pad_token_id = model.config.eos_token_id
|
||||
|
||||
logger.info("📚 Loading Capybara dataset")
|
||||
try:
|
||||
capybara_dataset = load_dataset("LDJnr/Capybara", split="train")
|
||||
capybara_dataset = capybara_dataset.map(format_capybara_prompts, batched=True, remove_columns=capybara_dataset.column_names)
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Failed to load Capybara dataset: {str(e)}")
|
||||
return
|
||||
|
||||
logger.info(f"📊 Capybara dataset size: {len(capybara_dataset)}")
|
||||
|
||||
def tokenize_function(examples):
|
||||
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=MAX_LENGTH)
|
||||
|
||||
logger.info("🔢 Tokenizing dataset")
|
||||
tokenized_dataset = capybara_dataset.map(tokenize_function, batched=True, remove_columns=capybara_dataset.column_names)
|
||||
|
||||
logger.info("🏋️ Setting up the training arguments")
|
||||
training_args = TrainingArguments(
|
||||
output_dir=OUTPUT_DIR,
|
||||
num_train_epochs=3,
|
||||
per_device_train_batch_size=BATCH_SIZE,
|
||||
gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
|
||||
learning_rate=LEARNING_RATE,
|
||||
weight_decay=0.01,
|
||||
bf16=True,
|
||||
logging_steps=10,
|
||||
save_steps=300,
|
||||
save_total_limit=10,
|
||||
dataloader_num_workers=4,
|
||||
warmup_steps=NUM_WARMUP_STEPS,
|
||||
gradient_checkpointing=True,
|
||||
evaluation_strategy="steps",
|
||||
eval_steps=300,
|
||||
max_steps=MAX_STEPS,
|
||||
fp16=False,
|
||||
optim="adamw_hf",
|
||||
lr_scheduler_type="cosine",
|
||||
load_best_model_at_end=True,
|
||||
metric_for_best_model="loss",
|
||||
greater_is_better=False,
|
||||
)
|
||||
|
||||
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
|
||||
|
||||
optimizer = GrokAdamW(
|
||||
model.parameters(),
|
||||
lr=LEARNING_RATE,
|
||||
betas=(0.9, 0.999),
|
||||
eps=1e-8,
|
||||
weight_decay=0.01,
|
||||
alpha_init=0.98,
|
||||
lamb=2.0,
|
||||
gamma=0.1,
|
||||
grokking_signal_fns=[grokking_signal_fn],
|
||||
grokking_signal_decay_rate=0.1,
|
||||
gradient_clipping=1.0
|
||||
)
|
||||
|
||||
logger.info("🏃♂️ Initializing Trainer with GrokAdamW")
|
||||
global trainer
|
||||
trainer = CustomTrainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_dataset,
|
||||
eval_dataset=tokenized_dataset.select(range(min(1000, len(tokenized_dataset)))),
|
||||
data_collator=data_collator,
|
||||
optimizers=(optimizer, None),
|
||||
)
|
||||
|
||||
logger.info("🔥 Starting the training with GrokAdamW")
|
||||
try:
|
||||
trainer.train()
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Training failed: {str(e)}")
|
||||
return
|
||||
|
||||
logger.info("💾 Saving the model")
|
||||
try:
|
||||
trainer.save_model(OUTPUT_DIR)
|
||||
except Exception as e:
|
||||
logger.error(f"❌ Failed to save model: {str(e)}")
|
||||
|
||||
logger.info("🎉 Finetuning with GrokAdamW completed!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
🚀 Now go forth and train, accelerate that code!
|
||||
|
||||
> **Note:** You'll need about 14GB of VRAM. If you have 8GB, change to batch size 4.
|
||||
|
||||
Results will appear in `./capybara_finetuned_results`
|
||||
|
||||
---
|
||||
|
||||
### Author
|
||||
|
||||
**Nisten Tahiraj**
|
||||
🏢 [rakun.ai](https://rakun.ai)
|
||||
📍 Toronto, Canada
|
||||
|
||||
---
|
||||
Happy training!
|
||||
<video controls autoplay muted src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/WCLhKzZWbrLo8BETGaKvI.qt"></video>
|
||||
3
biggie_groked_int8_q8_0.gguf
Normal file
3
biggie_groked_int8_q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c94683f4bda200ab06337ec237ab7953012d428b449e085563cd7a7a2b865658
|
||||
size 163636960
|
||||
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"_name_or_path": "/Users/n/hf/SmolLM-135M",
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 0,
|
||||
"eos_token_id": 0,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 576,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 1536,
|
||||
"max_position_embeddings": 2048,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 9,
|
||||
"num_hidden_layers": 35,
|
||||
"num_key_value_heads": 3,
|
||||
"pretraining_tp": 1,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_scaling": null,
|
||||
"rope_theta": 10000.0,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"transformers_version": "4.43.3",
|
||||
"use_cache": true,
|
||||
"vocab_size": 49152
|
||||
}
|
||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@@ -0,0 +1 @@
|
||||
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||
48901
merges.txt
Normal file
48901
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
old-biggie-smollm-checkpoint-twitter-q8_0.gguf
Normal file
3
old-biggie-smollm-checkpoint-twitter-q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0ede63599dc5381385ee91cf74c17b48fad06532697ee6d5196f737de01a5592
|
||||
size 193718304
|
||||
3
pytorch_model.bin
Normal file
3
pytorch_model.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8868a6657a83f51c2b8d5d88cf48f96cdf0631feed5582a152cbcc77cfae4299
|
||||
size 269149770
|
||||
42
special_tokens_map.json
Normal file
42
special_tokens_map.json
Normal file
@@ -0,0 +1,42 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|endoftext|>",
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<repo_name>",
|
||||
"<reponame>",
|
||||
"<file_sep>",
|
||||
"<filename>",
|
||||
"<gh_stars>",
|
||||
"<issue_start>",
|
||||
"<issue_comment>",
|
||||
"<issue_closed>",
|
||||
"<jupyter_start>",
|
||||
"<jupyter_text>",
|
||||
"<jupyter_code>",
|
||||
"<jupyter_output>",
|
||||
"<jupyter_script>",
|
||||
"<empty_output>"
|
||||
],
|
||||
"bos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"unk_token": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
98249
tokenizer.json
Normal file
98249
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
167
tokenizer_config.json
Normal file
167
tokenizer_config.json
Normal file
@@ -0,0 +1,167 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"3": {
|
||||
"content": "<repo_name>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"4": {
|
||||
"content": "<reponame>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"5": {
|
||||
"content": "<file_sep>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"6": {
|
||||
"content": "<filename>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"7": {
|
||||
"content": "<gh_stars>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"8": {
|
||||
"content": "<issue_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"9": {
|
||||
"content": "<issue_comment>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"10": {
|
||||
"content": "<issue_closed>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"11": {
|
||||
"content": "<jupyter_start>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"12": {
|
||||
"content": "<jupyter_text>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"13": {
|
||||
"content": "<jupyter_code>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"14": {
|
||||
"content": "<jupyter_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"15": {
|
||||
"content": "<jupyter_script>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"16": {
|
||||
"content": "<empty_output>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|endoftext|>",
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<repo_name>",
|
||||
"<reponame>",
|
||||
"<file_sep>",
|
||||
"<filename>",
|
||||
"<gh_stars>",
|
||||
"<issue_start>",
|
||||
"<issue_comment>",
|
||||
"<issue_closed>",
|
||||
"<jupyter_start>",
|
||||
"<jupyter_text>",
|
||||
"<jupyter_code>",
|
||||
"<jupyter_output>",
|
||||
"<jupyter_script>",
|
||||
"<empty_output>"
|
||||
],
|
||||
"bos_token": "<|endoftext|>",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"tokenizer_class": "GPT2Tokenizer",
|
||||
"unk_token": "<|endoftext|>",
|
||||
"vocab_size": 49152
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user