enginex-ascend-910-llama.cpp/examples/speculative/README.md

# llama.cpp/examples/speculative

Demonstration of speculative decoding and tree-based speculative decoding techniques

More info:

- https://github.com/ggerganov/llama.cpp/pull/2926
- https://github.com/ggerganov/llama.cpp/pull/3624
- https://github.com/ggerganov/llama.cpp/pull/5625
examples : add readme files 2023-11-29 11:00:17 +02:00			`# llama.cpp/examples/speculative`

english : use `typos` to fix comments and logs (#4354) 2023-12-12 01:53:36 -08:00			`Demonstration of speculative decoding and tree-based speculative decoding techniques`
examples : add readme files 2023-11-29 11:00:17 +02:00
			`More info:`

			`- https://github.com/ggerganov/llama.cpp/pull/2926`
			`- https://github.com/ggerganov/llama.cpp/pull/3624`
speculative : implement stochastic speculative sampling (#5625) * (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README 2024-03-05 03:24:00 +09:00			`- https://github.com/ggerganov/llama.cpp/pull/5625`