I don’t have many specific requirements, and GPT4All is working mostly well for me so far. That said, my latest use case for GPT4All is to help me plan a new Python-based project with examples as code snippets, and it lacks a specific quality of life feature, that is the “Copy Code” button.
There is an open issue on GPT4All’s GitHub, but as there is no guarantee that feature will ever be implemented, I thought I’d take this opportunity to explore if there are any other tools out there like GPT4All that offer a ChatGPT-like experience in the local environment. I’m neither a professional developer nor a sysadmin, so a lot of self hosting guides go over my head, which is what drew me to GPT4All in the first place, as it’s very accessible to non-developers like myself. That said, I’m open to suggestions and willing to learn new skills if that’s what it takes.
I’m running on Linux w/ AMD hardware: Ryzen 7 5800X3D processor + Radeon RX 6750 XT.
Any suggestions? Thanks in advance!
OpenWebUI is a superb front-end and supports just about any backend that you think of (including Ollama for locally hosted LLMs) and has some really nice features like pipelines that can extend out its functionality however you might need. Definitely has the “copy code” feature built-in and outputs markdown for regular documentation purposes.
OpenWebUI is also my go-to. It works nicely with runpods vllm template, so I can run local models but also use heavier ones at minimal cost when it suits me.
Thanks for the tip about OpenWebUI. After watching this video about its features, I want to learn more.
Would you mind sharing a little bit about your setup? For example, do you have a home lab or do you just run OpenWebUI w/ Ollama on a spare laptop or something? I thought I saw some documentation suggesting that this stack can be run on any system, but I’m curious how other people run it in the real world. Thanks!
Sure, I run OpenWebUI in a docker container from my TrueNAS SCALE home server (it’s one of their standard packages, so basically a 1-click install). From there I’ve configured API use with OpenAI, Gemini, Anthropic and DeepSeek (part of my job involves evaluating the performance of these big models for various in-house tasks), along with pipelines for some of our specific workflows and MCP via mcpo.
I previously had my ollama installation in another docker container but didn’t like having a big GPU in my NAS box, so I moved it to its own box. I am mostly interested in testing small/tiny models there. I again have Ollama running in a Docker container (just the official Docker image), but this time on a Debian bare-metal server, and I configured another OpenWebUI pipeline to point to that (OpenWebUI lets you select which LLM(s) you want to use on a conversation-by-conversation basis, so there’s no problem having a bunch of them hooked up at the same time).
Thank you, this is really helpful to inform my setup!
If you’re looking for a web UI and a simple way to host one yourself, nothing beats the “llama.cpp” project. They include a “llama-server” program which hosts a simple web server (with a chat webapp) and OpenAI-compatible API endpoint. It now also supports multimodality (for models that support multimodality), meaning you can for example upload an image and ask the assistant to describe the image. An example command to set up such a web server would be:
$ llama-server --threads 6 -m /path/to/model.gguf
Or, for multimodality support (like asking an AI to describe an image), use:
$ llama-server --threads 6 --mmproj /path/to/model/mmproj-F16.gguf -m /path/to/model/model.gguf
LM Studio although i’ve never tried the linux version.
I have. AppImage only is a weird choice but it works well
You can squeeze a lot more performance out with a newer framework and a model tailored for your GPU and task.
I’d recommend:
Kobold.cpp rocm, follow the quick-install guide here: https://github.com/YellowRoseCx/koboldcpp-rocm/?tab=readme-ov-file#quick-linux-install
Download this quantization, which fits in your VRAM pool nicely and is specifically tuned for coding and planning, select it in kobold.cpp: https://huggingface.co/mradermacher/Qwen3-14B-Esper3-i1-GGUF/blob/main/Qwen3-14B-Esper3.i1-IQ4_NL.gguf
Use the “corporate” UI in kobold.cpp in your browser. If that doesn’t work well, kobold.cpp also works as a generic OpenAI endpoint, which you can access from pretty much any app, like https://openwebui.com/
Page assist, runs in your browser and interfaces with ollama.
VSCode with the open source Cline extension. Easily the best open option, works everywhere. Excellent coding and planning agent, I use it for everything