Just use something like libsimdpp[1] and you are sure that your code stays vectorized between compiler versions. As a bonus, this and similar wrapper libraries give you an option to produce assembly for multiple instruction sets (say SSE2, AVX and NEON) from the same code.
[1]: https://github.com/p12tic/libsimdpp
Just use libsimdpp ( https://github.com/p12tic/libsimdpp ) or any of the myriad similar wrappers. With modest time investment you get almost optimal implementation for multiple instruction sets on any compiler you use.
Just use something like libsimdpp[1] and you are sure that your code stays vectorized between compiler versions. As a bonus, this and similar wrapper libraries give you an option to produce assembly for multiple instruction sets (say SSE2, AVX and NEON) from the same code. [1]: https://github.com/p12tic/libsimdpp
Sorry, my post was directed to the parent of your post. Somehow I misclicked somewhere and didn't notice.
Just use libsimdpp ( https://github.com/p12tic/libsimdpp ) or any of the myriad similar wrappers. With modest time investment you get almost optimal implementation for multiple instruction sets on any compiler you use.