78 lines
2.7 KiB
Markdown
78 lines
2.7 KiB
Markdown
---
|
|
base_model: aixonlab/Aether-12b
|
|
language:
|
|
- en
|
|
license: apache-2.0
|
|
tags:
|
|
- text-generation-inference
|
|
- transformers
|
|
- mistral
|
|
---
|
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/66dcee3321f901b049f48002/jWXtbknuetFdz5fkFn-ey.png" width="800"/>
|
|
|
|
# Grey-12b
|
|
|
|
Grey-12b is a merged language model created by combining multiple models using the della_linear merge method, with Aether-12b as the base model.
|
|
|
|
## Model Details 📊
|
|
- Developed by: AIXON Lab
|
|
- Model type: Merged Causal Language Model
|
|
- Language(s): English (primarily), may support other languages
|
|
- License: apache-2.0
|
|
- Repository: https://huggingface.co/aixonlab/Grey-12b
|
|
|
|
## Model Architecture 🏗️
|
|
- Base model: aixonlab/Aether-12b
|
|
- Parameter count: ~12 billion
|
|
- Architecture specifics: Transformer-based language model
|
|
- Merge method: della_linear
|
|
|
|
### Merged Models
|
|
1. VAGOsolutions/SauerkrautLM-Nemo-12b-Instruct
|
|
- Weight: 0.33
|
|
- Density: 0.4
|
|
2. cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b
|
|
- Weight: 0.77
|
|
- Density: 0.8
|
|
|
|
## Technical Specifications
|
|
- Dtype: float16
|
|
- Tokenizer source: base (aixonlab/Aether-12b)
|
|
- Merge parameters:
|
|
- Epsilon: 0.05
|
|
- Lambda: 1
|
|
|
|
## Intended Use 🎯
|
|
As an advanced language model for various natural language processing tasks, including but not limited to text generation, question-answering, and analysis.
|
|
|
|
## Ethical Considerations 🤔
|
|
As a merged model based on multiple sources, Grey-12b may inherit biases and limitations from its constituent models. Users should be aware of potential biases in generated content and use the model responsibly.
|
|
|
|
## Performance and Evaluation
|
|
Performance metrics and evaluation results for Grey-12b are yet to be determined. Users are encouraged to contribute their findings and benchmarks.
|
|
|
|
## Limitations and Biases
|
|
The model may exhibit biases present in its training data and constituent models. It's crucial to critically evaluate the model's outputs and use them in conjunction with human judgment.
|
|
|
|
## Additional Information
|
|
For more details on the base model and constituent models, please refer to their respective model cards and documentation.
|
|
|
|
## Acknowledgments 🙏
|
|
We acknowledge the contributions of:
|
|
- VAGOsolutions for the SauerkrautLM-Nemo-12b-Instruct model
|
|
- Cognitive Computations for the dolphin-2.9.3-mistral-nemo-12b model
|
|
|
|
## How to Use
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("aixonlab/Grey-12b")
|
|
tokenizer = AutoTokenizer.from_pretrained("aixonlab/Grey-12b")
|
|
|
|
prompt = "Once upon a time"
|
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
|
|
|
generated_ids = model.generate(input_ids, max_length=100)
|
|
generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
|
|
print(generated_text) |