CUDA full GPU acceleration, KV cache in VRAM

github.com

cross-posted to:
train@lemmy.intai.tech

CUDA full GPU acceleration, KV cache in VRAM

github.com

nii236@lemmy.jtmn.devM to LLM@lemmy.jtmn.devEnglish · 1 year ago

cross-posted to:
train@lemmy.intai.tech

CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp

github.com

This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...

You must log in or register to comment.

Chat