Tuesday, 10 September 2013

Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?

Does the following code incur the AVX-to-SSE transition penalty? If so,
how can I store the sum of a __m256 vector without incurring this penalty?
__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x); // now all fields of x contain the sum
float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));
Thank you.

No comments:

Post a Comment