license, language, metrics, pipeline_tag, tags
license language metrics pipeline_tag tags
mit
en
kn
accuracy
text-generation
bilingual
kannada
english

(This repo contains the sharded version of the original Ambari-7B model)

Ambari-7B-Base-v0.1 (sharded)

Overview

Ambari-7B-Base-v0.1 is the first bilingual English/Kannada model in the Ambari series, developed and released by Cognitivelab.in. Based on the Llama2 model by Meta, this 7B parameter model is the outcome of the pretraining stage, involving training on approximately 500 million new Kannada tokens.

Usage

To use the Ambari-7B-Base-v0.1 model, you can follow the example code below:

# Usage
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained('Cognitive-Lab/Ambari-7B-Base-v0.1')
tokenizer = LlamaTokenizer.from_pretrained('Cognitive-Lab/Ambari-7B-Base-v0.1')

prompt = "ಕನ್ನಡದ ಇತಿಹಾಸವನ್ನು ವಿವರವಾಗಿ ತಿಳಿಸಿ"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate
generate_ids = model.generate(inputs.input_ids, max_length=30)
decoded_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(decoded_output)

Important: The provided model serves as a foundation and is not designed for independent use. We strongly advise conducting finetuning tailored to your particular task(s) of interest before deploying it in a production environment. Feel free to customize the code according to your specific use case, ensuring that the model undergoes finetuning for optimal performance in your desired application.

Description
Model synced from source: fierysurf/Ambari-7B-base-v0.1-sharded
Readme 912 KiB