So you don’t have to click the link, here’s the full text including links:
Some of my favourite @huggingface models I’ve quantized in the last week (as always, original models are linked in my repo so you can check out any recent changes or documentation!):
@shishirpatil_ gave us gorilla’s openfunctions-v2, a great followup to their initial models: https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2
@fanqiwan released FuseLLM-VaRM, a fusion of 3 architectures and scales: https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2
@IBM used a new method called LAB (Large-scale Alignment for chatBots) for our first interesting 13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2
@NeuralNovel released several, but I’m a sucker for DPO models, and this one uses their Neural-DPO dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2
Locutusque, who has been making the Hercules dataset, released a preview of “Hyperion”: https://huggingface.co/bartowski/hyperion-medium-preview-exl2
@AjinkyaBawase gave an update to his coding models with code-290k based on deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2
@Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: https://huggingface.co/bartowski/Einstein-v4-7B-exl2
@WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks: https://huggingface.co/bartowski/StructLM-7B-exl2
and that’s just the highlights from this past week! If you’d like to see your model quantized and I haven’t noticed it somehow, feel free to reach out :)
Do you do any kind of before/after testing of these to measure performance/accuracy changes? I’ve always wondered if there is some way to generalize the expected performance changes at different quantizations.
You can get the resulting PPL but that’s only gonna get you a sanity check at best, an ideal world would have something like lmsys’ chat arena and could compare unquantized vs quantized but that doesn’t yet exist