pmeerw's blog
09 Jan 2012
The following code converts float values to 16-bit signed integer values using ARM NEON intrinsics (assuming n is a multiple of 4) -- for instance audio samples.
The vcvtq_s32_f32 instruction rounds towards zero, not towards
the nearest integer. In C, the semantics would be trunc() instead
of lrintf().
To overcome the issue, one could implement:
float a; short b = trunc(a + ((a > 0) ? 0.5 : - 0.5));To get rid of the condition, the trick is to get the sign bit (the MSB of a float) and
or it to the constant 0.5 before adding it to a.
In C:
float a; short b = trunc(a + float((uint32(a) & 0x8000000) | uint32(0.5)));
The complete code using ARM NEON intrinsics looks as follows:
void conv_s16_from_float(unsigned n, const float *a, short *b) {
unsigned i;
const float32x4_t plusone4 = vdupq_n_f32(1.0f);
const float32x4_t minusone4 = vdupq_n_f32(-1.0f);
const float32x4_t half4 = vdupq_n_f32(0.5f);
const float32x4_t scale4 = vdupq_n_f32(32767.0f);
const uint32x4_t mask4 = vdupq_n_u32(0x80000000);
for (i = 0; i < n/4; i++) {
float32x4_t v4 = ((float32x4_t *)a)[i];
v4 = vmulq_f32(vmaxq_f32(vminq_f32(v4, plusone4) , minusone4), scale4);
const float32x4_t w4 = vreinterpretq_f32_u32(vorrq_u32(vandq_u32(
vreinterpretq_u32_f32(v4), mask4), vreinterpretq_u32_f32(half4)));
((int16x4_t *)b)[i] = vmovn_s32(vcvtq_s32_f32(vaddq_f32(v4, w4)));
}
}
posted at: 13:35 | path: /programming | permanent link