初始化项目,由ModelHub XC社区提供模型

Model: Joaoffg/SHARE-14B-Base-2604
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-30 22:21:26 +08:00
commit e1e5ff8cf6
15 changed files with 249829 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

176
LICENSE Normal file
View File

@@ -0,0 +1,176 @@
~~~
Generated on: 2026-02-16 13:33:15.715000+00:00
License ID: 02a257f4-5c16-41c6-9db1-d2b86954ea90
License Template Version: e8502289197accc4ddd023f0fc234ca26062a9f1
~~~
### **Social-Humanities AI For Research and Education RAIL-MS**
Licensed Artifact(s):
- Model
- Source Code
**Section I: PREAMBLE**
This RAIL License is generally applicable to the Artifact(s) identified above.
For valuable consideration, You and Licensor agree as follows:
**1. Definitions**
(a) “**Application**” refers to a sequence of instructions or statements written in machine code language, including object code (that is the product of a compiler), binary code (data using a two-symbol system) or an intermediate language (such as register transfer language).
(b) “**Artifact**” refers to a software application (in either binary or source code format), Model, and/or Source Code, in accordance with what is specified above as the “Licensed Artifact”.
(c) ”**Contribution**" means any work, including any modifications or additions to an Artifact, that is intentionally submitted to Licensor for inclusion or incorporation in the Artifact directly or indirectly by the rights owner. For the purposes of this definition, “**submitted**” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing, sharing and improving the Artifact, but excluding communication that is conspicuously marked or otherwise designated in writing by the contributor as "**Not a Contribution.**"
(d) "**Contributor**" means Licensor or any other individual or legal entity that creates or owns a Contribution that is added to or incorporated into an Artifact or its Derivative.
(e) **“Data”** means a collection of information and/or content extracted from the dataset used with a given Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
(f) **“Derivative**” means a work derived from or based upon an Artifact, and includes all modified versions of such Artifact.
(g) **“Distribution”** means any transmission, reproduction, publication or other sharing of an Artifact or Derivative to a third party, including providing a hosted service incorporating the Artifact, which is made available by electronic or other remote means - e.g. API-based or web access.
(h) “**Harm**” includes but is not limited to physical, mental, psychological, financial and reputational damage, pain, or loss.
(i) "**License**" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
(j) “**Licensor**” means the rights owner (by virtue of creation or documented transfer of ownership) or entity authorized by the rights owner (e.g., exclusive licensee) that is granting the rights in this License.
(k) “**Model**” means any machine-learning based assembly or assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Source Code.
(l) **“Output”** means the results of operating a Model as embodied in informational content resulting therefrom.
(m) “**Permitted Purpose**” means for non-commercial scientific research, development, and education only. "Non-commercial" means use explicitly not intended for or directed towards commercial advantage or monetary compensation.
(n) “**Source Code**” means any collection of text written using human-readable programming language, including the code and scripts used to define, run, load, benchmark or evaluate a Model or any component thereof, and/or used to prepare data for training or evaluation, if any. Source Code includes any accompanying documentation, tutorials, examples, etc, if any. For clarity, the term “Source Code” as used in this License includes any and all Derivatives of such Source Code.
(o) “**Third Parties**” means individuals or legal entities that are not under common control with Licensor or You.
(p) **“Use”** includes accessing, using, copying, modifying, and/or distributing an Artifact; in connection with a Model as Artifact, Use also includes creating content, fine-tuning, updating, running, training, evaluating and/or re-parametrizing such Model.
(q) "**You**" (or "**Your**") means an individual or legal entity receiving and exercising permissions granted by this License and/or making use of the Artifact for permitted purposes and in any permitted field of use, including usage of the Artifact in an end-use application - e.g. chatbot, translator, image generator, etc.
**Section II: INTELLECTUAL PROPERTY RIGHTS**
Both copyright and patent grants may apply to the Artifact. The Artifact is subject to additional terms as described in Section III below, which govern the use of the Artifact in the event that Section II is held unenforceable or inapplicable.
**2. Grant of Copyright License**. Conditioned upon compliance with Section III below and subject to the terms and conditions of this License, each Contributor hereby grants to You, only in connection with the Permitted Purpose, a worldwide, non-exclusive, royalty-free copyright license to reproduce, use, publicly display, publicly perform, sublicense, and distribute the Artifact and Derivatives thereof.
**3. Grant of Patent License**. Conditioned upon compliance with Section III below and subject to the terms and conditions of this License, and only where and as applicable, each Contributor hereby grants to You, only in connection with the Permitted Purpose, a worldwide, non-exclusive, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, and use the Artifact where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Artifact to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Artifact and/or a Contribution incorporated within the Artifact constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License in connection with the Artifact shall terminate as of the date such litigation is asserted or filed.
Licensor and Contributor each have the right to grant the licenses above.
**Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION**
**4. Use-based restrictions.** The restrictions set forth in Attachment A are mandatory Use-based restrictions. Therefore You may not Use the Artifact in violation of such restrictions. You may Use the Artifact only subject to this License. You shall require all of Your users who use the Artifact or its Derivative to comply with the terms of this paragraph and only for the Permitted Purpose.
**5. The Output You Generate with a Model (as Artifact). Except as set forth herein, Licensor claims no ownership rights in the Output You generate. You are accountable for the Output You generate and its subsequent uses. However, Your use of the Output is strictly subject to the Use Restrictions in Attachment A. For the avoidance of doubt, You may not use the Output to contravene any provision stated in this License, including the prohibitions on model distillation and training data extraction.
**6. Distribution and Redistribution**. You may host for Third Party remote access purposes (e.g. software-as-a-service), reproduce and distribute copies of the Artifact or its Derivatives in any medium, with or without modifications, provided that You meet the following conditions:
1. Use-based restrictions in paragraph 4 MUST be included as a condition precedent to effect any type of legal agreement (e.g. a license) governing the use and/or distribution of the Artifact or its Derivatives, and You shall give such notice to any subsequent Third Party recipients;
2. You shall give any Third Party recipients of the Artifact or its Derivatives a copy of this License;
3. You shall cause any modified files to carry prominent notices stating that You changed the files;
4. You shall retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Artifact or its Derivatives.
5. You and any Third Party recipients of the Artifact or its Derivative shall adhere to the Permitted Purpose.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions with respect to paragraph **6.1.,** to govern the use, reproduction, or Distribution of Your modifications, or for any Derivative, **provided that** Your use, reproduction, and Distribution of the Artifact or its Derivative otherwise complies with the conditions stated in this License. In other words, the Use-based restrictions in Attachment A form the minimum set of terms for You to license to Third Parties any Artifact or its Derivative, but You may add more restrictive terms if You deem it necessary.
**Section IV: OTHER PROVISIONS**
**7. Updates and Runtime Restrictions.** To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Artifact in violation of this License or update the Artifact through electronic means.
**8. Trademarks and related.** Nothing in this License permits You to make use of Licensors trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
**9. Disclaimer of Warranty**. Unless required by applicable law or agreed to in writing, Licensor provides the Artifact (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using the Artifact, and assume any risks associated with Your exercise of permissions under this License.
**10. Limitation of Liability**. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Artifact (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
**11.** If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
**12.** **Term and Termination.** The term of this License will commence upon the earlier of (a) Your acceptance of this License or (b) accessing the Artifact; and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Licensor may terminate this License if You are in breach of any term or condition of this Agreement. Upon termination of this Agreement, You shall delete and cease use of the Artifact. Section 10 shall survive the termination of this License.
END OF TERMS AND CONDITIONS
**Attachment A**
### **USE RESTRICTIONS**
You agree not to use the Artifact or its Derivatives in any of the following ways:
1. Discrimination
(a) To discriminate or exploit individuals or groups based on legally protected characteristics and/or vulnerabilities.
(b) For purposes of administration of justice, law enforcement, immigration, or asylum processes, such as predicting that a natural person will commit a crime or the likelihood thereof.
(c) To engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, or other essential goods and services.
2. Military
(a) For weaponry or warfare.
3. Legal
(a) To engage or enable fully automated decision-making that adversely impacts a natural person\'s legal rights without expressly and intelligibly disclosing the impact to such natural person and providing an appeal process.
(b) To engage or enable fully automated decision-making that creates, modifies or terminates a binding, enforceable obligation between entities; whether these include natural persons or not.
(c) In any way that violates any applicable national, federal, state, local or international law or regulation.
4. Disinformation
(a) To create, present or disseminate verifiably false or misleading information for economic gain or to intentionally deceive the public, including creating false impersonations of natural persons.
(b) To synthesize or modify a natural person\'s appearance, voice, or other individual characteristics, unless prior informed consent of said natural person is obtained.
(c) To autonomously interact with a natural person, in text or audio format, unless disclosure and consent is given prior to interaction that the system engaging in the interaction is not a natural person.
(d) To defame or harm a natural person\'s reputation, such as by generating, creating, promoting, or spreading defamatory content (statements, images, or other content).
(e) To generate or disseminate information (including - but not limited to - images, code, posts, articles), and place the information in any public context without expressly and intelligibly disclaiming that the information and/or content is machine generated.
5. Privacy
(a) To utilize personal information to infer additional personal information about a natural person, including but not limited to legally protected characteristics, vulnerabilities or categories; unless informed consent from the data subject to collect said inferred personal information for a stated purpose and defined duration is received.
(b) To generate or disseminate personal identifiable information that can be used to harm an individual or to invade the personal privacy of an individual.
6. Health
(a) To provide medical advice or make clinical decisions without necessary (external) accreditation of the system; unless the use is (i) in an internal research context with independent and accountable oversight and/or (ii) with medical professional oversight that is accompanied by any related compulsory certification and/or safety/quality standard for the implementation of the technology.
7. Research
(a) In connection with any academic dishonesty, including submitting any informational content or output of a Model as Your own work in any academic setting.
8. Malware
(a) To generate and/or disseminate malware (including - but not limited to - ransomware) or any other content to be used for the purpose of Harming electronic systems;
9. General
(a) To Intentionally deceive or mislead others, including failing to appropriately disclose to end users any known dangers of your system.
10. Model Development and Data Integrity
(a) To use the Artifact, its Derivatives, or any Output to directly or indirectly train, pre-train, fine-tune, or evaluate any other machine learning model or artificial intelligence system (including, but not limited to, model distillation or the generation of synthetic data for training purposes).
(b) To intentionally interact with, query, or prompt the Model for the purpose of discovering, extracting, reconstructing, reverse-engineering, or reproducing the Data (as defined in Section 1(e)) or any specific texts or information used to train, pre-train, or evaluate the Model.

200
README.md Normal file
View File

@@ -0,0 +1,200 @@
---
datasets:
- allenai/peS2o
language:
- en
- nl
license: other
license_name: rail-share
license_link: LICENSE
metrics:
- perplexity
library_name: transformers
pipeline_tag: text-generation
---
# Model Card for SHARE-14B
SHARE-14B (Social-Humanities AI for Research and Education) is a 14-billion-parameter decoder-only causal language model pretrained exclusively on content relevant to the social sciences and humanities (SSH). It is intended as a domain-specific base model for SSH research and education, and is designed to be used through the MIRROR interface, which surfaces token-level surprisal rather than generating new text.
More information can be found in the paper [SHARE: Social-Humanities AI for Research and Education](https://huggingface.co/papers/2604.11152).
**Note:** This is an intermediate checkpoint released after ~15% of planned pretraining (96B tokens of a target ~630B). It is a base (pretrained-only) model with no SFT, DPO, or RLHF. This base model is not suitable to chat applications.
## Model Details
### Model Description
SHARE-14B is the first causal language model fully pretrained by and for the SSH disciplines. It mirrors the Phi-4 14B architecture but uses a custom 50,000-token BPE tokenizer trained on the SHARE corpus, and is pretrained exclusively on a curated SSH dataset drawn from Wikipedia, Project Gutenberg, PeS2o, and CORE. On a custom SSH Cloze benchmark, the current checkpoint achieves performance close to Phi-4 14B (0.796 vs 0.818 prior-corrected accuracy) while having seen roughly 100× fewer training tokens.
- **Developed by:** João Gonçalves, Sonia de Jager, Petr Knoth, David Pride, Nick Jelicic
- **Funded by:** NVIDIA Academic Grant; NWO-SURF Small Compute Grant (EINF-15690); Dutch Research Council (NWO) VENI grant VI.Veni.221S.154
- **Model type:** Decoder-only transformer causal language model (Phi-4 architecture)
- **Language(s) (NLP):** Primarily English, with a smaller proportion of Dutch
- **License:** Custom Responsible AI License (RAIL-SHARE) — non-commercial, no model distillation, restricted text generation use
### Model Sources
- **Repository:** https://github.com/Joaoffg/SHARE
- **Paper:** [SHARE: Social-Humanities AI for Research and Education](https://arxiv.org/abs/2604.11152)
- **Contact:** ferreiragoncalves@eshcc.eur.nl
## Uses
### Direct Use
SHARE-14B is intended primarily as a base model deployed through the MIRROR interface for SSH researchers, educators, and students. Through MIRROR, the model is used to compute token-level surprisal and entropy on user-written texts in order to:
- Identify typos, stylistic anomalies, and possible factual mistakes in academic writing
- Highlight innovative or unexpected contributions in scholarly texts
- Surface disciplinary biases and norms encoded in SSH literature
- Support reflective revision of student and scholarly writing in the SSH
### Downstream Use
Potential downstream uses include perplexity-based analyses of SSH texts, domain-specific text classification, and research on the structure and biases of SSH scholarly discourse. Downstream use is governed by the RAIL-SHARE license (non-commercial; no distillation).
### Out-of-Scope Use
- Commercial applications of any kind (forbidden by license)
- Model distillation into other models (forbidden by license)
- Unconstrained text generation, especially in academic contexts where it could enable student or faculty fraud
- STEM, biomedical, mathematical, or coding tasks — the model was deliberately not trained on these domains
- Use as a chat assistant — the model is base-pretrained only, with no SFT or alignment
- Multilingual applications outside of English and (to a lesser extent) Dutch
- Any safety-critical decision-making
## Bias, Risks, and Limitations
SHARE-14B inherits the systemic biases present in the open-access English-language SSH scholarship it was trained on. As illustrated in the paper, terms associated with non-Western scholarship (e.g. "African" in the context of locations of knowledge production) can register as unexpected, reflecting the field's existing imbalances rather than properties of the topics themselves.
Other limitations and risks:
- **English-dominant data**, which is a meaningful constraint for SSH fields where multilingual scholarship matters
- **Intermediate checkpoint:** only ~15% of planned pretraining is complete, so capabilities will continue to evolve
- **Causal interpretation effect:** because surprisal is computed on preceding tokens, an early mistake in a text propagates and can mask later anomalies
- **Use in text reading/reviewing** could be misused to shortcut careful reading of academic work
- **No alignment or safety tuning** has been applied — the model is released as a base model
### Recommendations
Users should treat MIRROR outputs as prompts for reflection rather than authoritative judgments. Surprisal does not equal correctness, and unexpectedness can signal innovation as readily as error. When using MIRROR for revision, work from the beginning of the text to mitigate the propagation of earlier surprisal into later tokens. Researchers should be aware of the model's biases toward dominant SSH discourses and read its outputs critically. Use of SHARE for direct text generation is discouraged.
## Training Details
### Training Data
The training corpus combines three SSH-focused subsets:
- **Wikipedia** (English and Dutch): articles selected by traversing the category tree from SSH-relevant main topic classifications using PetScan and extracted with WikiExtractor
- **Project Gutenberg:** books filtered by SSH-relevant Library of Congress Classes (B, C, D, G, H, J, K, L, M, N)
- **Academic publications:** drawn from PeS2o and CORE, filtered using AllenAI's Field of Science (FoS) classifier to retain SSH disciplines (Art, Business, Economics, Geography, Education, History, Law, Linguistics, Philosophy, Political Science, Psychology, Sociology), plus additional materials provided through agreements with publishers including Open Humanities Press
The full corpus is in the order of dozens of billions of tokens. See the technical report for details on filtering and selection.
### Training Procedure
#### Preprocessing
Raw data preprocessing was carried out exclusively on EU servers. A custom BPE tokenizer with a 50,000-token vocabulary was trained on the full SHARE corpus.
#### Training Hyperparameters
- **Training regime:** Mixed precision with FlashAttention-2, torch.compile, Liger Kernel, sequence packing, and FSDP
- **Architecture:** Phi-4 14B (decoder-only transformer)
- **Context length:** 4096 tokens
- **Warm-up steps:** 2000
- **Learning rate:** Manually monitored and adjusted between 5-day Snellius runs (started at 1.58e-4, adjusted to 1e-4 for the second run), motivated by concerns that cosine decay underutilizes data fed in later pretraining stages
- **Weight decay:** 0.1
#### Speeds, Sizes, Times
Training was initiated on Saturn Cloud using 8× NVIDIA A100 80GB GPUs for 167 hours under FSDP, then continued on the Dutch supercomputer Snellius using 5 nodes of 4× H100 GPUs (20 GPUs total) for approximately 225 hours. As of this checkpoint, the model has been trained on 96 billion tokens (~15% of the planned ~630B-token compute-optimal target across 2 epochs of the data mix).
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- **Perplexity comparison:** Erasmus University Rotterdam research output abstracts from Q3Q4 2025, out of distribution from the training data
- **SSH Cloze benchmark:** 275 SSH abstracts published in Q1 2026 (25 per Web of Science field across 11 SSH disciplines), constructed by selecting sentences with equivalent-token decisions (e.g. positive/negative, higher/lower) where SSH knowledge is required to predict the correct token
#### Factors
- Scientific domain (FoS classifier categories)
- Faculty affiliation of authors at Erasmus University Rotterdam (used as an ecological-validity check)
#### Metrics
- Log-perplexity difference relative to Phi-4 (lower means better SHARE fit)
- Raw and prior-corrected accuracy on the SSH Cloze benchmark (prior correction accounts for models guessing the more frequent token)
### Results
On the SSH Cloze benchmark, SHARE-14B achieves 77.1% raw accuracy and 79.6% prior-corrected accuracy at the 96B-token checkpoint. This is close to Phi-4 14B (81.8% / 81.8%) despite Phi-4 being trained on roughly 9.8 trillion tokens, and clearly above OLMO-2 13B at the 168B-token Step-20k checkpoint (74.9% / 73.8%) and fully trained Pythia-12B (67.3% / 61.5%).
Perplexity analyses show that the gap between SHARE-14B and Phi-4 is consistently smaller for SSH fields (Art, Education, Sociology) than for STEM fields (Biology, Engineering, Medicine), indicating the intended SSH specialization. At the faculty level, the same pattern holds: Erasmus MC (medical) shows the largest gap, while SSH-focused faculties show the smallest.
#### Summary
SHARE-14B at 15% of training is already substantially more capable than the smaller SHARE-4B (evaluation perplexity 5.26 vs 11.94) and approaches the performance of Phi-4 on SSH-relevant token prediction at a small fraction of its training cost.
## Model Examination
Memorization probes using deterministic generation from texts in the pretraining corpus — including data seen most recently — show that SHARE-14B does not reproduce copyrighted content. The few instances of memorization observed correspond only to disclaimers and standard headers. Early experiments with instruction-tuned variants further suggest that, because the training data deliberately excludes domains such as cybersecurity, biological weapons, and CSAM, classical safety risks are limited; the model also tends to default to harm-reducing framings when prompted with SSH-relevant harmful queries.
## Environmental Impact
- **Hardware Type:** 8× NVIDIA A100 80GB (Saturn Cloud) and 20× NVIDIA H100 (5 nodes × 4 GPUs, Snellius supercomputer)
- **Hours used:** ~167 hours on A100s + ~225 hours on H100s for the current checkpoint
- **Cloud Provider:** Saturn Cloud (initial phase) and SURF / Snellius supercomputer (current phase)
- **Compute Region:** United States (Saturn Cloud, initial phase only); Netherlands (Snellius)
- **Carbon Emitted:** Not precisely measured for the 14B model yet;
The project applied Chinchilla scaling laws to budget compute, used efficiency techniques (mixed precision, torch.compile, FlashAttention-2, Liger Kernel, gradient checkpointing) to reduce energy use.
## Citation
**BibTeX:**
```bibtex
@misc{gonçalves2026sharesocialhumanitiesairesearch,
title={SHARE: Social-Humanities AI for Research and Education},
author={João Gonçalves and Sonia de Jager and Petr Knoth and David Pride and Nick Jelicic},
year={2026},
eprint={2604.11152},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.11152},
}
```
**APA:**
Gonçalves, J., de Jager, S., Knoth, P., Pride, D., & Jelicic, N. (2026). SHARE: Social-humanities AI for research and education. arXiv. https://arxiv.org/abs/2604.11152
## Privacy statement
Personal data, such as author names, may be included in the training documents for share, we use legitimate interest as legal basis for processing the data under the EU's GDPR. The full privacy statement can be consulted here: https://surfdrive.surf.nl/s/gFnxgL6f5jer8yy
## Glossary
- **SSH:** Social Sciences and Humanities
- **MIRROR:** Model Interface for Reflective Research Output Revisions — the user interface that displays per-token surprisal from SHARE rather than generating text
- **Surprisal:** Negative log probability of an observed token under the model
- **Prior-corrected accuracy:** Cloze accuracy adjusted to discount correct guesses arising from token frequency priors
- **FoS:** Field of Science (AllenAI classifier used for disciplinary labelling)
- **RAIL:** Responsible AI License
## More Information
This model is released as part of an intermediate technical report and is intended to invite feedback from the SSH and ML communities. Companion resources include the SHARE-4B model and the MIRROR interface.
## Model Card Authors
João Gonçalves
## Model Card Contact
ferreiragoncalves@eshcc.eur.nl

32
config.json Normal file
View File

@@ -0,0 +1,32 @@
{
"architectures": [
"Phi3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 0,
"dtype": "bfloat16",
"embd_pdrop": 0.0,
"eos_token_id": 1,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 17920,
"max_position_embeddings": 16384,
"model_type": "phi3",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 10,
"original_max_position_embeddings": 16384,
"pad_token_id": 3,
"partial_rotary_factor": 1.0,
"resid_pdrop": 0.0,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 250000,
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 50000
}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 3,
"transformers_version": "4.57.6"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2b8ac78ec41f20587971c8123ed7cec26fe90c236a0ae042ba4617e672f31d30
size 4732645848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8aa8c3f087bee2b0d4ff6507fcafa05db3924f4f52c885c17dc07c935a6c87c2
size 4771169088

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6b1a6856a60f97f38ff4d97271c120cba0dbf1451f914ab2cc27bc56917398bd
size 4771169120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:670ede0487603702e0014b0a5b5ca384f561c8c07e56cbeff3d042294899e8da
size 4771169120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:55e6cb5c14bf718133fa40827dbca92348980afce6f50b961ccaabd0c0ffa439
size 4771169120

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c7340abdc3b0f0c40c46e979fc80182f2d69003756d6244bee27849677e762b0
size 4470511728

View File

@@ -0,0 +1,251 @@
{
"metadata": {
"total_parameters": 14143902720,
"total_size": 28287805440
},
"weight_map": {
"lm_head.weight": "model-00006-of-00006.safetensors",
"model.embed_tokens.weight": "model-00001-of-00006.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.0.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.1.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.10.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.11.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.12.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.13.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.14.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.15.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.16.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.17.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.18.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.input_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.mlp.gate_up_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.19.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.2.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.20.self_attn.qkv_proj.weight": "model-00003-of-00006.safetensors",
"model.layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.21.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.22.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.23.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.24.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.25.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.input_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.mlp.gate_up_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.26.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.27.self_attn.qkv_proj.weight": "model-00004-of-00006.safetensors",
"model.layers.28.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.28.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.29.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.3.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.30.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.30.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.30.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.31.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.31.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.31.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.32.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.32.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.32.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.32.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.32.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.32.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.input_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.33.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.mlp.gate_up_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
"model.layers.33.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.33.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.34.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.34.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.34.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.34.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.34.self_attn.qkv_proj.weight": "model-00005-of-00006.safetensors",
"model.layers.35.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.35.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.35.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.35.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.35.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.35.self_attn.qkv_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.36.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.36.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.36.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.36.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.36.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.36.self_attn.qkv_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.37.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.37.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.37.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.37.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.37.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.37.self_attn.qkv_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.38.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.38.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.38.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.38.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.38.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.38.self_attn.qkv_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.39.input_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.39.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.39.mlp.gate_up_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.39.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
"model.layers.39.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.39.self_attn.qkv_proj.weight": "model-00006-of-00006.safetensors",
"model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.4.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.input_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.mlp.gate_up_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.5.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.6.self_attn.qkv_proj.weight": "model-00001-of-00006.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.7.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.8.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.mlp.gate_up_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
"model.layers.9.self_attn.qkv_proj.weight": "model-00002-of-00006.safetensors",
"model.norm.weight": "model-00006-of-00006.safetensors"
}
}

30
special_tokens_map.json Normal file
View File

@@ -0,0 +1,30 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

249036
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

44
tokenizer_config.json Normal file
View File

@@ -0,0 +1,44 @@
{
"added_tokens_decoder": {
"0": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"1": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"3": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"extra_special_tokens": {},
"model_max_length": 1000000000000000019884624838656,
"pad_token": "<pad>",
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<unk>"
}