Oct 2012
How to efficiently convert an audio sample in 16-bit signed integer format to a 32-bit float value on an ARM NEON CPU? And how to achieve bit-exact results?
There are several ways to do it in different projects:
s16 -> float |
float -> s16 |
|
PulseAudio | flt = sample / (float) 0x7fff; |
sample = lrintf(clip_flt(flt) * 0x7fff) |
libavresample | flt = sample / (float) (1<<15); |
sample = (s16) clip_s16(lrintf(flt * (1 << 15))); |
RtAudio | flt = (sample + 0.5f) * (1 / 32767.5f); |
sample = (s16) (flt * 32767.5f - 0.5f); |
clip_s16()
saturates a 16-bit short integer (-32768..32767); clip_flt()
returns a float -1.0..1.0.
Observations regarding PulseAudio:
flt_to_s16(s16_to_flt(x)) != x
for x == -32768
x / (float) 0x7fff != x * (1.0f / 0x7fff)
on the other hand
x / (float) (1<<15) == x * (1.0f / (1<<15))
and the second form would allow to avoid division in favour of
multiplication of the inverse; the problem with the first form is a slight deviation for certain input values
lrintf()
rounds according to the current rounding mode which by default is
round-toward-nearest integer, toward-even for tie breaking). For example, this means:
12.3 -> 12 12.5 -> 12 (!) 12.7 -> 13 13.3 -> 13 13.5 -> 14 (!) 13.7 -> 14So .5 values are rounded to an even value.
static void float_to_s16(const float *src, int16_t *dst) { __asm__ __volatile__ ( "vdup.f32 q2, %[two23] \n\t" "vdup.f32 q3, %[scale] \n\t" "vdup.u32 q4, %[mask] \n\t" "vdup.f32 q5, %[mone] \n\t" "vld1.32 {q0}, [%[src]]! \n\t" /* load x */ "vmaxq.f32 q0, q0, q5 \n\t" /* clip at -1.0 */ "vmul.f32 q0, q0, q3 \n\t" /* scale */ "vand.u32 q1, q0, q4 \n\t" /* get sign bit */ "vorr.u32 q1, q1, q2 \n\t" /* put sign on 2^23 */ "vadd.f32 q0, q1, q0 \n\t" /* sgn(x)*2^23 + x ... */ "vsub.f32 q0, q0, q1 \n\t" /* ... - sgn(x)*2^23 */ "vcvt.s32.f32 q0, q0 \n\t" /* convert to int */ "vqmovn.s32 d0, q0 \n\t" /* saturate and narrow */ "vst1.16 {d0}, [%[dst]]! \n\t" : [dst] "+r" (dst), [src] "+r" (src) /* output operands (or input operands that get modified) */ : [scale] "r" (32767.0f), [two23] "r" (8.3886080000e+06f), [mask] "r" (0x80000000), [mone] "r" (-1.0f) /* input operands */ : "memory", "cc", "q0", "q1", "q2", "q3", "q4", "q5" /* clobber list */ ); }Observations:
vmaxq
instruction is needed to match PulseAudio's clipping sematics (clip -1.0f to -32767 instead of -32768); otherwise, vqmovn
takes care of narrowing a 32-bit signed integer to 16-bit and saturation.
vand
) and or-ing the sign to the two23
value.
lrintf()
rounding we have: one extra vand, vor, vadd, vsub
-- not bad!
sample / (float) 0x7fff;
without actually performing the costly division. I am not saying that this makes much sense, but hey, we can
First we observe that a discrepancy (i.e. sample / (float) 0x7fff != sample * (1.0f / 0x7fff)
) occurs when the binary representation of the input value
converted to float ends in 0x4000 (that is, q0 & 0xffff == 0x4000
after the vcvt
instruction). There are 1536 such problematic values over all possible inputs.
static void s16_to_float(const int16_t *src, float *dst) { __asm__ __volatile__ ( "vdup.f32 q1, %[invscale] \n\t" "vdup.u16 q3, %[mask] \n\t" "vdup.u32 q4, %[one] \n\t" "vld1.16 {d0}, [%[src]]! \n\t" /* load x */ "vmovl.s16 q0, d0 \n\t" /* s16 -> s32 */ "vcvt.f32.s32 q0, q0 \n\t" /* s32 -> float */ "vceq.u16 q2, q0, q3 \n\t" /* check for defect */ "vand.u32 q2, q2, q4 \n\t" /* prepare 1 if defect */ "vmul.f32 q0, q0, q1 \n\t" /* multiply by invscale */ "vadd.u32 q0, q0, q2 \n\t" /* correct if defect */ "vst1.32 {q0}, [%[dst]]! \n\t" : [dst] "+r" (dst), [src] "+r" (src) /* output operands (or input operands that get modified) */ : [invscale] "r" (invscale), [mask] "r" (0x4000), [one] "r" (1) /* input operands */ : "memory", "cc", "q0", "q1", "q2", "q3", "q4" /* clobber list */ ); }Observations:
vceq
check for the problem condition; it sets each matching 16-bit word to 0xffff if is 0x4000.
vand
just keeps the LSB in each 32-bit word, hence a 1 indicated the problem condition for each float.
vadd
adds the correction bit to the multiplication result -- making multiplication by the inverse of 0x7fff identical to divsion by 0x7fff.
vceq, vand, vadd
-- not bad!
posted at: 11:17 | path: /programming | permanent link
The Linux phone from Nokia, the N9, is available in Austria -- for 'free' with a 2 year service contract from drei.at.
So far, I'm quite happy with it and there is a no-stress root shell.
posted at: 15:27 | path: / | permanent link
http://www.st.com/internet/evalboard/product/252419.jsp https://github.com/texane/stlink http://jeremyherbert.net/get/stm32f4_getting_started https://www.das-labor.org/trac/browser/microcontroller/src-stm32f4xx/serialUSB https://github.com/nabilt/STM32F4-Discovery-Firmware Programming STM32 F2, F4 ARMs under Linux: A Tutorial from Scratch http://www.triplespark.net/elec/pdev/arm/stm32.html
posted at: 22:50 | path: / | permanent link
I totally suck in electronics and anything electricity related. Current is not my friend. That's why I have to build a crappy NOT circuit using the help of the almighty search engine.
Here is what I built with a BC547 NPN transistor that I had lying around; using a 10 KOhm resistor at the collector and a 1 KOhm resistor at the base.
Let's see if I can use above thing to invert a serial TTL signal to feed into a RS-232 UART.
posted at: 15:29 | path: / | permanent link
We just got two MaKey MaKey's; watch the video and see what it is!
In short, it's an USB HID mouse/keyboard device provided by an Arduino Leonardo (vendor:product 0x2341:0x8036). The Makey Makey firmware detects closed switches on digital input pins using 50 mega ohms pullups.
posted at: 21:32 | path: /fun | permanent link
The following IIO drivers went upstream:
posted at: 10:04 | path: /projects | permanent link