With bold one can see the benchmarks where this merge overtakes the basemodel in performance.
💻 Usage
!pipinstall-qUtransformersacceleratefromtransformersimportAutoTokenizerimporttransformersimporttorchmodel="paulilioaica/Hugo-7B-slerp"messages=[{"role":"user","content":"What is a large language model?"}]tokenizer=AutoTokenizer.from_pretrained(model)pipeline=transformers.pipeline("conversational",model=model,torch_dtype=torch.float16,device_map="auto",)outputs=pipeline(messages,max_new_tokens=256,do_sample=True,temperature=0.7,top_k=50,top_p=0.95)print(outputs)