CUDA full GPU acceleration, KV cache in VRAM

github.com

cross-posted to:
llm@lemmy.jtmn.dev

CUDA full GPU acceleration, KV cache in VRAM

github.com

manitcor@lemmy.intai.techM to Machine Learning - Training | Fine Tuning@lemmy.intai.techEnglish · 1 year ago

cross-posted to:
llm@lemmy.jtmn.dev

CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp

github.com

This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...

You must log in or register to comment.

Chat