Store the sum of a __m256 vector without the AVX-to-SSE transition penalty?
Does the following code incur the AVX-to-SSE transition penalty? If so,
how can I store the sum of a __m256 vector without incurring this penalty?
__mm256 x_swap = _mm_permute2f128_ps(x,x,1)
x = _mm256_add_ps(x, x_swap);
x = _mm256_hadd_ps(x,x);
x = _mm256_hadd_ps(x,x); // now all fields of x contain the sum
float sum;
_mm_store_ss(&sum, _mm256_castps256_ps128(x));
Thank you.
No comments:
Post a Comment