Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Integer tricks and optimizations are pointless.

They’re not pointless; they’re just not the first thing to optimize.

It’s like worrying about cache locality when you have an inherently O(n^2) algorithm and could have a O(n log n) or O(n) one. Fix the biggest problem first.

Once your data layout is good and your cpu isn’t taking a 200 cycle lunch break to chase pointers, then you worry about cycle count and keeping the execution units fed.

That’s when integer tricks can matter. Depending on the micro arch, you may have twice as many execution units that can take integer instructions. And those instructions (outside of division) tend to have lower latency and higher throughput.

And if you’re doing SIMD, your integer SIMD instructions can be 2 or 4x higher throughput than float32 if you can use int16 / int8 data.

So it can very much matter. It’s just usually not the lowest hanging fruit.



> And if you’re doing SIMD, your integer SIMD instructions can be 2 or 4x higher throughput than float32 if you can use int16 / int8 data.

Your float instructions can also be 2x the throughput if you use f16. With no need to go for specific divisors.

For values that even can pack into 8 bits, you rarely have a way to process enough at once to actually get more throughput than with wider numbers.

I'm sure there's a program where it very much matters, but my bet is on it not even mildly mattering, and there basically always being a hundred more useful optimizations to work on.


Problem with f16 is that hardware support is still "new" and can't be relied on in consumer grade CPUs yet.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: