初始化项目，由ModelHub XC社区提供模型

Model: marin-community/marin-8b-base Source: Original Platform
2026-05-07 20:06:52 +08:00
commit e35f376326
12 changed files with 2374 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,53 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bin.* filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*.tfevents* filter=lfs diff=lfs merge=lfs -text
+*.db* filter=lfs diff=lfs merge=lfs -text
+*.ark* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
+**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
+ 
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.gguf* filter=lfs diff=lfs merge=lfs -text
+*.ggml filter=lfs diff=lfs merge=lfs -text
+*.llamafile* filter=lfs diff=lfs merge=lfs -text
+*.pt2 filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,225 @@
+---
+license: apache-2.0
+datasets:
+- allenai/dolmino-mix-1124
+- allenai/olmo-mix-1124
+- bigcode/starcoderdata
+- EleutherAI/proof-pile-2
+- hltcoe/megawika
+- mlfoundations/dclm-baseline-1.0
+- HuggingFaceTB/finemath
+- marin-community/ar5iv-noproblem-markdown
+- marin-community/ar5iv-warning-markdown
+- marin-community/datashop-science-qa
+- marin-community/stackexchange-markdown
+- marin-community/wikipedia-markdown
+# REMINDER: when the instruct model should add dependencies on the instruct datasets and the base model.
+language:
+- en
+tags:
+- text-generation
+---
+
+<img alt="Marin Logo" src="https://huggingface.co/datasets/marin-community/blog-images/resolve/main/marin-boat.jpg" width="96" style="margin-left:'auto' margin-right:'auto' display:'block'">
+
+
+# Model Card for Marin 8B
+
+This is the model card for the Marin 8B Base model. [The Marin Project](https://marin.community) is a collaborative effort to develop open-source foundation models.
+
+## Datasets
+
+### Datasets used in Marin 8B Base
+
+Marin 8B Base was trained on a variety of datasets:
+
+- [Nemotron-CC](https://data.commoncrawl.org/contrib/Nemotron/Nemotron-CC/index.html)
+- [DCLM Baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0)
+- [Starcoder Data](https://huggingface.co/datasets/bigcode/starcoderdata)
+- [Proofpile 2](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
+- [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) 3+
+- [Dolma](https://huggingface.co/datasets/allenai/dolma), including their versions of:
+  - [MegaWika](https://huggingface.co/datasets/hltcoe/megawika)
+  - [peS2o](https://huggingface.co/datasets/allenai/peS2o)
+  - (And most of the rest of it)
+- [Dolmino-Mix-1124](https://huggingface.co/datasets/allenai/dolmino-mix-1124), including their versions of:
+    - [FLAN](https://arxiv.org/abs/2109.01652)
+    - [CodeSearchNet](https://arxiv.org/abs/1909.09436) (with OWM Filter)
+    - [GSM8K](https://arxiv.org/pdf/2110.14168v1)
+    - [MetaMath](https://arxiv.org/abs/2309.12284)
+    - [MathCoder2 Synthetic](https://arxiv.org/abs/2310.03731)
+
+
+And some new datasets:
+
+- [Marin Markdownified StackExchange](https://huggingface.co/datasets/marin-community/stackexchange-markdown)
+- [Marin Markdownified Wikipedia](https://huggingface.co/datasets/marin-community/wikipedia-markdown)
+- [Marin Markdownified Ar5iv (No Problem)](https://huggingface.co/datasets/marin-community/ar5iv-noproblem-markdown)
+- [Marin Markdownified Ar5iv (Warnings)](https://huggingface.co/datasets/marin-community/ar5iv-warning-markdown)
+- [Marin Datashop Science QA](https://huggingface.co/datasets/marin-community/datashop-science-qa)
+
+The first three are licensed per their original licenses. The fourth is licensed under CC-BY-SA 4.0.
+
+### Datasets used in Marin 8B Instruct
+
+Marin 8B Instruct is currently an SFT-only model. It was trained on the following datasets:
+
+- [TIGER-Lab/AceCode-89K](https://huggingface.co/datasets/TIGER-Lab/AceCode-89K)
+- [bespokelabs/Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)
+- [cognitivecomputations/dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) (includes both nonreasoning and reasoning subsets)
+- [tuenguyen/dolphin_r1_reasoning](https://huggingface.co/datasets/tuenguyen/dolphin_r1_reasoning)
+- [facebook/natural_reasoning](https://huggingface.co/datasets/facebook/natural_reasoning)
+- [open-r1/OpenThoughts-114k-math](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math)
+- [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk)
+- [allenai/tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture)
+- [PrimeIntellect/verifiable-math-problems](https://huggingface.co/datasets/PrimeIntellect/verifiable-math-problems)
+
+It is quite likely that we will release improved versions of this model in the future.
+
+## Checkpoints
+
+We release a large number of checkpoints.
+
+### Base Model Checkpoints
+
+Main Page: [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base)
+
+| Name              | Training Tokens | Link                                                                                                       |
+|-------------------|-----------------|------------------------------------------------------------------------------------------------------------|
+| `main`            | 12.7T           | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/main)            |
+| `kestrel`         | 2.7T            | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/kestrel)                               |
+| `ocelot`          | 3.78T           | [kestrel](https://huggingface.co/marin-community/marin-8b-base/tree/ocelot)                                |
+| `jellyfish`       | 4.78T           | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/jellyfish)       |
+| `phoenix`         | 11.1T           | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/phoenix)         |
+| `starling`        | 12.4T           | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/starling)        |
+| `deeper-starling` | 12.7T           | [marin-community/marin-8b-base](https://huggingface.co/marin-community/marin-8b-base/tree/deeper-starling) |
+
+`main` currently refers to `deeper-starling`.
+This may change in the future, but we will maintain compatibility at the architecture and tokenizer level, 
+so the model will remain drop-in compatible with existing tooling.
+If you require a specific checkpoint, please use the `revision` argument.
+
+### Instruct Model Checkpoints
+
+Main Page: [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct)
+
+| Name                    | SFT Tokens | Link                                                                                                                     |
+|-------------------------|------------|--------------------------------------------------------------------------------------------------------------------------|
+| `main`                 | 5.3B       | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
+| `deeper-starling-05-15` | 5.3B       | [marin-community/marin-8b-instruct](https://huggingface.co/marin-community/marin-8b-instruct/tree/deeper-starling-05-15) |
+
+`main` currently refers to `deeper-starling-05-15`. This may change in the future, though we will maintain model compatibility. If you require a specific checkpoint, please use the `revision` argument.
+
+## Installation
+
+Marin 8B uses the [Llama architecture](https://arxiv.org/abs/2302.13971) and as such should
+work out-of-the-box with the [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library
+and any other library that supports the Llama architecture.
+
+
+We use a variant of the Llama 3 tokenizer: [stanford-crfm/marin-tokenizer](https://huggingface.co/stanford-crfm/marin-tokenizer/).
+
+## Inference
+
+You can use Marin with the standard HuggingFace Transformers library:
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-8b-base")
+tokenizer = AutoTokenizer.from_pretrained("marin-community/marin-8b-base")
+message = ["The Marin wind is"]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+response = marin.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
+```
+
+We released a number of checkpoints of this model. To load a specific checkpoint, simply add the argument `revision`:
+
+```bash
+marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-8b-base", revision="deeper-starling")
+```
+
+### Model Description
+
+- **Developed by:** The Marin team at Stanford CRFM.
+- **Model type:** a Transformer style autoregressive language model.
+- **Knowledge Cutoff:** ~July 2024
+- **Language(s) (NLP):** English
+- **License:** The code and model are released under Apache 2.0.
+- **Contact:** `dlwh at stanford.edu`
+
+### Model Sources
+
+- **Project Page:** https://marin.community
+- **Repositories:**
+    - Core repo (data and experiment management): https://github.com/marin-community/marin
+    - Training code: https://github.com/stanford-crfm/levanter
+- **Retrospective:** https://marin.readthedocs.io/en/latest/reports/marin-8b-retro.html
+- **W&B Logs:** [Marin 8B](https://wandb.ai/stanford-mercury/marin/reports/Tootsie-8B---VmlldzoxMTY3MzU3OA)
+
+
+## Evaluation
+
+
+### Base Model Results
+
+We ran a suite of standard benchmarks to compare our model with [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), and the open source 7-8B models [Olmo 2 7B](https://huggingface.co/allenai/OLMo-2-1124-7B), and [MAP NEO 7B](https://huggingface.co/m-a-p/neo_7b).
+For all benchmarks, we used [LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
+
+| Model | Average | AGI Eval LSAT-AR | ARC Challenge | ARC Easy | BBH | BoolQ | CommonSense QA | COPA | GPQA | GSM8K | HellaSwag_1, 10 shot | HellaSwag, 0 shot | lambada_openai | MMLU Pro | MMLU_5shot | MMLU-0shot | OpenBookQA | PIQA | WinoGrande | WSC |
+|-------|---------|-----------------|---------------|----------|-----|-------|----------------|------|------|-------|---------------------|------------------|---------------|----------|------------|------------|-----------|------|------------|-----|
+| Marin 8B Base  <br/>(Deeper Starling) | **66.6** | 20.9 | **63.1** | **86.5** | **50.6** | **85.9** | 79.1 | **92.0** | 30.3 | 61.3 | **83.6** | **82.3** | **74.7** | **36.5** | **67.6** | **65.9** | 44.2 | **84.4** | **74.5** | 82.1 |
+| Llama 3.1 Base | 65.3 | 20.4 | 58.9 | 85.8 | 46.4 | 84.2 | 75.2 | **92.0** | **32.3** | 56.8 | 81.9 | 79.4 | **74.7** | 33.3 | 66.4 | 65.5 | 45.8 | 82.9 | 74.4 | 83.5 |
+| OLMo 2 Base | 64.9 | 17.4 | 60.7 | 85.0 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | **67.6** | 81.7 | 80.5 | 73.1 | 30.6 | 63.9 | 61.9 | **46.2** | 82.5 | 74.3 | **86.1** |
+| MAP NEO 7B | 59.5 | **23.0** | 52.0 | 81.1 | 42.4 | 84.7 | **81.7** | 82.0 | 27.8 | 48.0 | 73.3 | 72.5 | 64.6 | 25.2 | 58.2 | 56.4 | 39.4 | 79.0 | 66.1 | 73.3 |
+
+Marin 8B Base fares well on most of these tasks.
+
+
+## Model Details
+
+Please see [our technical retrospective](https://marin.readthedocs.io/en/latest/reports/marin-8b-retro.html) for more details on the pretraining process.
+
+### Architecture Details
+
+- **Architecture:** Llama 3 8B
+- **Hidden size:** 4096
+- **Feedforward size:** 14336
+- **Number of layers:** 32
+- **Number of attention heads:** 32
+- **Number of KV heads:** 8
+
+### Tokenizer Details
+
+Marin 8B uses a variant of the Llama 3 tokenizer: [stanford-crfm/marin-tokenizer](https://huggingface.co/stanford-crfm/marin-tokenizer/). It has the same vocabulary but bundles a chat template into the base tokenizer for convenience.
+
+### Training Phases
+
+#### Pre-training Phases
+
+- *Kestrel (DCLM WSD-S Phase)*: DCLM+StarCoder+Proofpile2 using [WSD-S](https://arxiv.org/abs/2410.05192) (0->2.7T tokens)
+- *Ocelot (DCLM WSD Phase)*: Increased batch size, using WSD. (2.7T->3.78T tokens)
+- *Jellyfish (First Cooldown)*: Higher quality data (~Dolmino+Fine Math). (3.78T->4.78T tokens)
+- *Phoenix (Reheated)*: Rapid rewarming + [Nemotron-CC](https://arxiv.org/abs/2412.02595) (plus [Starcoder](https://huggingface.co/datasets/bigcode/starcoderdata)). (4.78T->11.1T tokens)
+- *Starling (Second Cooldown)*: Another cooldown. We followed a similar process to the first cooldown, but added a few new datasets. (11.1T->12.4 tokens)
+- *Deeper Starling*: Somewhat more pretraining. (12.4->12.7T tokens)
+
+All released pre-training checkpoints except Kestrel use an exponential moving average of the model weights.
+
+#### SFT Phase
+
+SFT was comparably simple, consisting of only one phase for 5.3B tokens.
+
+## Bias, Risks, and Limitations
+
+Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from Marin or any LLM are often inaccurate, so responses should be verified.
+
+Marin 8B has not undergone any safety tuning or evaluation. We strongly recommend that users use this model with caution and consider the risks when applying this technology.
+In particular, this model is not intended for fully autonomous use.
+
+## Model Card Contact
+For errors in this model card, please open an issue in this repository. For technical inquiries, please contact `dlwh at stanford.edu`.
+
+## Acknowledgements
+
+The compute for this model was generously provided by Google's [TPU Research Cloud](https://sites.research.google/trc/about/).
--- a/config.json
+++ b/config.json
@@ -0,0 +1 @@
+{"vocab_size": 128256, "max_position_embeddings": 4096, "hidden_size": 4096, "intermediate_size": 14336, "num_hidden_layers": 32, "num_attention_heads": 32, "num_key_value_heads": 8, "hidden_act": "silu", "initializer_range": 0.02, "rms_norm_eps": 1e-05, "pretraining_tp": 1, "use_cache": true, "rope_theta": 500000, "rope_scaling": {"factor": 8.0, "low_freq_factor": 1.0, "high_freq_factor": 4.0, "original_max_position_embeddings": 8192, "rope_type": "llama3"}, "attention_bias": false, "attention_dropout": 0.0, "mlp_bias": false, "head_dim": 128, "return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "torch_dtype": null, "use_bfloat16": false, "tf_legacy_loss": false, "pruned_heads": {}, "tie_word_embeddings": false, "chunk_size_feed_forward": 0, "is_encoder_decoder": false, "is_decoder": false, "cross_attention_hidden_size": null, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "typical_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "exponential_decay_length_penalty": null, "suppress_tokens": null, "begin_suppress_tokens": [128000, 128001], "architectures": ["LlamaForCausalLM"], "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": 128000, "pad_token_id": null, "eos_token_id": 128001, "sep_token_id": null, "decoder_start_token_id": 128000, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "_attn_implementation_autoset": false, "transformers_version": "4.51.3", "model_type": "llama"}
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:7d489975e373b2511d4b19ae60a388383f43eac731797618b63db5ff1f87924a
+size 9831465704
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3974f53217c73f51de651bf00d7e28d73157b886356cf13de1fb391494acf27f
+size 9865007800
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1f6ec3a7cac62c6b0229b48bdf0655b01522fbf29e478278cce1ab9b0fb94179
+size 8221912272
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a910404f1e332ec92e8410e5eacd6436ed3482ec78357c4bd84415fcd2cb1238
+size 4202692840
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,16 @@
+{
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|end_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
				`@@ -0,0 +1 @@`
				{"vocab_size": 128256, "max_position_embeddings": 4096, "hidden_size": 4096, "intermediate_size": 14336, "num_hidden_layers": 32, "num_attention_heads": 32, "num_key_value_heads": 8, "hidden_act": "silu", "initializer_range": 0.02, "rms_norm_eps": 1e-05, "pretraining_tp": 1, "use_cache": true, "rope_theta": 500000, "rope_scaling": {"factor": 8.0, "low_freq_factor": 1.0, "high_freq_factor": 4.0, "original_max_position_embeddings": 8192, "rope_type": "llama3"}, "attention_bias": false, "attention_dropout": 0.0, "mlp_bias": false, "head_dim": 128, "return_dict": true, "output_hidden_states": false, "output_attentions": false, "torchscript": false, "torch_dtype": null, "use_bfloat16": false, "tf_legacy_loss": false, "pruned_heads": {}, "tie_word_embeddings": false, "chunk_size_feed_forward": 0, "is_encoder_decoder": false, "is_decoder": false, "cross_attention_hidden_size": null, "add_cross_attention": false, "tie_encoder_decoder": false, "max_length": 20, "min_length": 0, "do_sample": false, "early_stopping": false, "num_beams": 1, "num_beam_groups": 1, "diversity_penalty": 0.0, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "typical_p": 1.0, "repetition_penalty": 1.0, "length_penalty": 1.0, "no_repeat_ngram_size": 0, "encoder_no_repeat_ngram_size": 0, "bad_words_ids": null, "num_return_sequences": 1, "output_scores": false, "return_dict_in_generate": false, "forced_bos_token_id": null, "forced_eos_token_id": null, "remove_invalid_values": false, "exponential_decay_length_penalty": null, "suppress_tokens": null, "begin_suppress_tokens": [128000, 128001], "architectures": ["LlamaForCausalLM"], "finetuning_task": null, "id2label": {"0": "LABEL_0", "1": "LABEL_1"}, "label2id": {"LABEL_0": 0, "LABEL_1": 1}, "tokenizer_class": null, "prefix": null, "bos_token_id": 128000, "pad_token_id": null, "eos_token_id": 128001, "sep_token_id": null, "decoder_start_token_id": 128000, "task_specific_params": null, "problem_type": null, "_name_or_path": "", "_attn_implementation_autoset": false, "transformers_version": "4.51.3", "model_type": "llama"}
				`@@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`