pmeerw's blog

Mon, 09 Jan 2012

Convert float-to-int with ARM NEON intrinsics

The following code converts float values to 16-bit signed integer values using ARM NEON intrinsics (assuming n is a multiple of 4) -- for instance audio samples.

The vcvtq_s32_f32 instruction rounds towards zero, not towards the nearest integer. In C, the semantics would be trunc() instead of lrintf().

To overcome the issue, one could implement:

float a;
short b = trunc(a + ((a > 0) ? 0.5 : - 0.5));
To get rid of the condition, the trick is to get the sign bit (the MSB of a float) and or it to the constant 0.5 before adding it to a. In C:
float a;
short b = trunc(a + float((uint32(a) & 0x8000000) | uint32(0.5)));

The complete code using ARM NEON intrinsics looks as follows:

void conv_s16_from_float(unsigned n, const float *a, short *b) {
    unsigned i;
    const float32x4_t plusone4 = vdupq_n_f32(1.0f);
    const float32x4_t minusone4 = vdupq_n_f32(-1.0f);
    const float32x4_t half4 = vdupq_n_f32(0.5f);
    const float32x4_t scale4 = vdupq_n_f32(32767.0f);
    const uint32x4_t mask4 = vdupq_n_u32(0x80000000);
    for (i = 0; i < n/4; i++) {
        float32x4_t v4 = ((float32x4_t *)a)[i];
        v4 = vmulq_f32(vmaxq_f32(vminq_f32(v4, plusone4) , minusone4), scale4);
        const float32x4_t w4 = vreinterpretq_f32_u32(vorrq_u32(vandq_u32(
            vreinterpretq_u32_f32(v4), mask4), vreinterpretq_u32_f32(half4)));
        ((int16x4_t *)b)[i] = vmovn_s32(vcvtq_s32_f32(vaddq_f32(v4, w4)));

posted at: 13:35 | path: /programming | permanent link

Made with PyBlosxom