Somewhat tangential to GPU_programming, but I think its still on subject as AVX512 is very similar to GPU-code fundamentally.

GCC makes a big advancement to its AVX512 code generation, for fully masked vectorization to handle less-than-full vectors.