Another thing I’d recommend is running kobold.cpp instead of ollama if you want to get into the nitty gritty of llms. Its more customizable and (ultimately) faster on more hardware.
Thats good info for low spec laptops. Thanks for the software recommendation. Need to do some more research on the model you suggested. I think you confused me for the other guy though. Im currently working with a six core ryzen 2600 CPU and a RX 580 GPU. edit- no worries we are good it was still great info for the thinkpad users!
Yeah you should get kobold.cpp’s rocm fork working if you can manage it, otherwise use their vulkan build.
llama 8b at shorter context is probably good for your machine, as it can fit on the 8GB GPU at shorter context, or at least be partially offloaded if its a 4GB one.
I wouldn’t recommend deepseek for your machine. It’s a better fit for older CPUs, as it’s not as smart as llama 8B, and its bigger than llama 8B, but it just runs super fast because its an MoE.
Honestly as small as you can manage.
Again, you will get much better speeds out of “extreme” MoE models like deepseek chat lite: https://huggingface.co/YorkieOH10/DeepSeek-V2-Lite-Chat-Q4_K_M-GGUF/tree/main
Another thing I’d recommend is running kobold.cpp instead of ollama if you want to get into the nitty gritty of llms. Its more customizable and (ultimately) faster on more hardware.
Thats good info for low spec laptops. Thanks for the software recommendation. Need to do some more research on the model you suggested. I think you confused me for the other guy though. Im currently working with a six core ryzen 2600 CPU and a RX 580 GPU. edit- no worries we are good it was still great info for the thinkpad users!
8GB or 4GB?
Yeah you should get kobold.cpp’s rocm fork working if you can manage it, otherwise use their vulkan build.
llama 8b at shorter context is probably good for your machine, as it can fit on the 8GB GPU at shorter context, or at least be partially offloaded if its a 4GB one.
I wouldn’t recommend deepseek for your machine. It’s a better fit for older CPUs, as it’s not as smart as llama 8B, and its bigger than llama 8B, but it just runs super fast because its an MoE.