PR by Xuan-Son Nguyen for `llama.cpp`: > This PR provides a big jump in speed for WASM by leveraging SIMD instructions for `qX_K_q8_K` and `qX_0_q8_0` dot product functions. > > …
This is a quantization function. It’s a fairly “math brained” name I agree, but the function is called qX_K_q8_K because it quantizes a value with a quantization index of X (unknown) to one with a quantization index of 8 (bits) which correlates to the memory usage. The 0 vs K portions are how it does rounding, 0 means it does rounding by equal distribution (without offset), and K means it creates a distribution that is more fine grained around more common values and is more rough around least common values. e.g. I have a data set that has a lot of values between 4 and 5 but not a lot of 10s. I have lets say 10 brackets between 4 and 5 but only 3 between 5 and 10.
Basically it’s a lossy compression for a data set into a specific enumeration (roughly correlates with size), so it’s a way to given 1,000,000 numbers from 1-1000000, of putting their values into a range of numbers based on the q level How using different functions affects the output of models is more voodoo than anything else. You get better “quality” output from higher memory space, but quality is a complex metric and doesn’t necessarily map to factual accuracy in the output, just statistical correlation with the model’s data set.
An example of a common quantizer is an analog to digital converter. It must take continuous values from a wave that goes 0 to 1 and transform them into digital values of 0 and 1 with a specific sample rate.
Taking a 32 bit float and copying the value into 32 bit float is an identity quantizer.
My question is who is naming these functions ‘qX_K_q8_K’
This is a quantization function. It’s a fairly “math brained” name I agree, but the function is called
qX_K_q8_K
because it quantizes a value with a quantization index of X (unknown) to one with a quantization index of 8 (bits) which correlates to the memory usage. The0
vsK
portions are how it does rounding,0
means it does rounding by equal distribution (without offset), andK
means it creates a distribution that is more fine grained around more common values and is more rough around least common values. e.g. I have a data set that has a lot of values between 4 and 5 but not a lot of 10s. I have lets say 10 brackets between 4 and 5 but only 3 between 5 and 10.Basically it’s a lossy compression for a data set into a specific enumeration (roughly correlates with size), so it’s a way to given 1,000,000 numbers from 1-1000000, of putting their values into a range of numbers based on the q level How using different functions affects the output of models is more voodoo than anything else. You get better “quality” output from higher memory space, but quality is a complex metric and doesn’t necessarily map to factual accuracy in the output, just statistical correlation with the model’s data set.
An example of a common quantizer is an analog to digital converter. It must take continuous values from a wave that goes 0 to 1 and transform them into digital values of 0 and 1 with a specific sample rate.
Taking a 32 bit float and copying the value into 32 bit float is an identity quantizer.
C devs love cryptic names :)
Writing lisp:
(defun generate-eight-new-magic-numbers-for-system-x ()...)
Writing c:
struct mnums* g8_nmn_sx () {...}
s-exps are the one true syntax and every other syntax was a mistake, I will die on this hill
s-exps is when you fuck your Playstation
🤣