llama3-janus/README.md

---
language:
- en
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text2text-generation
---

## Janus
(Built with Meta Llama 3)

For the version with the PoS tag visit [Janus (PoS)](https://huggingface.co/ChangeIsKey/llama3-janus-pos).

### Model Details
- **Model Name**: Janus
- **Version**: 1.0
- **Developers**: Pierluigi Cassotti, Nina Tahmasebi
- **Affiliation**: University of Gothenburg
- **License**: MIT
- **GitHub Repository**: [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)
- **Paper**: [Sense-specific Historical Word Usage Generation](https://transacl.org)
- **Contact**: pierluigi.cassotti@gu.se

### Model Description
Janus is a fine-tuned **Llama 3 8B** model designed to generate historically and semantically accurate word usages. It takes as input a word, its sense definition, and a year and produces example sentences that reflect linguistic usage from the specified period. This model is particularly useful for **semantic change detection**, **historical NLP**, and **linguistic research**.

### Intended Use
- **Semantic Change Detection**: Investigating how word meanings evolve over time.
- **Historical Text Processing**: Enhancing the understanding and modeling of historical texts.
- **Corpus Expansion**: Generating sense-annotated corpora for linguistic studies.

### Training Data
- **Dataset**: Extracted from the **Oxford English Dictionary (OED)**
- **Size**: Over **1.2 million** sense-annotated historical usages
- **Time Span**: **1700 - 2020**
- **Data Format**:
  ```
  <year><|t|><lemma><|t|><definition><|s|><historical usage sentence><|end|>
  ```
- **Janus (PoS) Format**:
  ```
  <year><|t|><lemma><|t|><definition><|p|><PoS><|p|><|s|><historical usage sentence><|end|>
  ```

### Training Procedure
- **Base Model**: `meta-llama/Llama-3-8B`
- **Optimization**: **QLoRA** (Quantized Low-Rank Adaptation)
- **Batch Size**: **4**
- **Learning Rate**: **2e-4**
- **Epochs**: **1**

### Model Performance
- **Temporal Accuracy**: Root mean squared error (RMSE) of **~52.7 years** (close to OED ground truth)
- **Semantic Accuracy**: Comparable to OED test data on human evaluations
- **Context Variability**: Low lexical repetition, preserving natural linguistic diversity

### Usage Example
#### Generating Historical Usages
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ChangeIsKey/llama3-janus"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

input_text = "1800<|t|>awful<|t|>Used to emphasize something unpleasant or negative; ‘such a’, ‘an absolute’.<|s|>"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

output = model.generate(**inputs, temperature=1.0, top_p=0.9, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

For more examples, see the GitHub repository [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)

### Limitations & Ethical Considerations
- **Historical Bias**: The model may reflect biases present in historical texts.
- **Time Granularity**: The temporal resolution is approximate (~50 years RMSE).
- **Modern Influence**: Despite fine-tuning, the model may still generate modern phrases in older contexts.
- **Not Trained for Fairness**: The model has not been explicitly trained to be fair or unbiased. It may produce sensitive, outdated, or culturally inappropriate content.

### Citation
If you use Janus, please cite:
```
@article{10.1162/tacl_a_00761,
    author = {Cassotti, Pierluigi and Tahmasebi, Nina},
    title = {Sense-specific Historical Word Usage Generation},
    journal = {Transactions of the Association for Computational Linguistics},
    volume = {13},
    pages = {690-708},
    year = {2025},
    month = {07},
    abstract = {Large-scale sense-annotated corpora are important for a range of tasks but are hard to come by. Dictionaries that record and describe the vocabulary of a language often offer a small set of real-world example sentences for each sense of a word. However, on their own, these sentences are too few to be used as diachronic sense-annotated corpora. We propose a targeted strategy for training and evaluating generative models producing historically and semantically accurate word usages given any word, sense definition, and year triple. Our results demonstrate that fine-tuned models can generate usages with the same properties as real-world example sentences from a reference dictionary. Thus the generated usages will be suitable for training and testing computational models where large-scale sense-annotated corpora are needed but currently unavailable.},
    issn = {2307-387X},
    doi = {10.1162/tacl_a_00761},
    url = {https://doi.org/10.1162/tacl\_a\_00761},
    eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00761/2535111/tacl\_a\_00761.pdf},
}
```
-												初始化项目，由ModelHub XC社区提供模型

Model: ChangeIsKey/llama3-janus
Source: Original Platform

											
										
										
											2026-05-25 21:11:23 +08:00
+								---
 								language:
 								- en
 								base_model:
 								- meta-llama/Meta-Llama-3-8B
 								pipeline_tag: text2text-generation
 								---
 								## Janus
 								(Built with Meta Llama 3)
 								For the version with the PoS tag visit [Janus (PoS)](https://huggingface.co/ChangeIsKey/llama3-janus-pos).
 								### Model Details
 								- **Model Name**: Janus
 								- **Version**: 1.0
 								- **Developers**: Pierluigi Cassotti, Nina Tahmasebi
 								- **Affiliation**: University of Gothenburg
 								- **License**: MIT
 								- **GitHub Repository**: [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)
 								- **Paper**: [Sense-specific Historical Word Usage Generation](https://transacl.org)
 								- **Contact**: pierluigi.cassotti@gu.se
 								### Model Description
 								Janus is a fine-tuned **Llama 3 8B** model designed to generate historically and semantically accurate word usages. It takes as input a word, its sense definition, and a year and produces example sentences that reflect linguistic usage from the specified period. This model is particularly useful for **semantic change detection**, **historical NLP**, and **linguistic research**.
 								### Intended Use
 								- **Semantic Change Detection**: Investigating how word meanings evolve over time.
 								- **Historical Text Processing**: Enhancing the understanding and modeling of historical texts.
 								- **Corpus Expansion**: Generating sense-annotated corpora for linguistic studies.
 								### Training Data
 								- **Dataset**: Extracted from the **Oxford English Dictionary (OED)**
 								- **Size**: Over **1.2 million** sense-annotated historical usages
 								- **Time Span**: **1700 - 2020**
 								- **Data Format**:
 								  ```
 								  <year><|t|><lemma><|t|><definition><|s|><historical usage sentence><|end|>
 								  ```
 								- **Janus (PoS) Format**:
 								  ```
 								  <year><|t|><lemma><|t|><definition><|p|><PoS><|p|><|s|><historical usage sentence><|end|>
 								  ```
 								### Training Procedure
 								- **Base Model**: `meta-llama/Llama-3-8B`
 								- **Optimization**: **QLoRA** (Quantized Low-Rank Adaptation)
 								- **Batch Size**: **4**
 								- **Learning Rate**: **2e-4**
 								- **Epochs**: **1**
 								### Model Performance
 								- **Temporal Accuracy**: Root mean squared error (RMSE) of **~52.7 years** (close to OED ground truth)
 								- **Semantic Accuracy**: Comparable to OED test data on human evaluations
 								- **Context Variability**: Low lexical repetition, preserving natural linguistic diversity
 								### Usage Example
 								#### Generating Historical Usages
 								```python
 								from transformers import AutoModelForCausalLM, AutoTokenizer
 								import torch
 								model_name = "ChangeIsKey/llama3-janus"
 								tokenizer = AutoTokenizer.from_pretrained(model_name)
 								model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
 								input_text = "1800<|t|>awful<|t|>Used to emphasize something unpleasant or negative; ‘such a’, ‘an absolute’.<|s|>"
 								inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
 								output = model.generate(**inputs, temperature=1.0, top_p=0.9, max_new_tokens=50)
 								print(tokenizer.decode(output[0], skip_special_tokens=True))
 								```
 								For more examples, see the GitHub repository [Historical Word Usage Generation](https://github.com/ChangeIsKey/historical-word-usage-generation)
 								### Limitations & Ethical Considerations
 								- **Historical Bias**: The model may reflect biases present in historical texts.
 								- **Time Granularity**: The temporal resolution is approximate (~50 years RMSE).
 								- **Modern Influence**: Despite fine-tuning, the model may still generate modern phrases in older contexts.
 								- **Not Trained for Fairness**: The model has not been explicitly trained to be fair or unbiased. It may produce sensitive, outdated, or culturally inappropriate content.
 								### Citation
 								If you use Janus, please cite:
 								```
 								@article{10.1162/tacl_a_00761,
 								    author = {Cassotti, Pierluigi and Tahmasebi, Nina},
 								    title = {Sense-specific Historical Word Usage Generation},
 								    journal = {Transactions of the Association for Computational Linguistics},
 								    volume = {13},
 								    pages = {690-708},
 								    year = {2025},
 								    month = {07},
 								    abstract = {Large-scale sense-annotated corpora are important for a range of tasks but are hard to come by. Dictionaries that record and describe the vocabulary of a language often offer a small set of real-world example sentences for each sense of a word. However, on their own, these sentences are too few to be used as diachronic sense-annotated corpora. We propose a targeted strategy for training and evaluating generative models producing historically and semantically accurate word usages given any word, sense definition, and year triple. Our results demonstrate that fine-tuned models can generate usages with the same properties as real-world example sentences from a reference dictionary. Thus the generated usages will be suitable for training and testing computational models where large-scale sense-annotated corpora are needed but currently unavailable.},
 								    issn = {2307-387X},
 								    doi = {10.1162/tacl_a_00761},
 								    url = {https://doi.org/10.1162/tacl\_a\_00761},
 								    eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00761/2535111/tacl\_a\_00761.pdf},
 								}
 								```