Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These sizes are incredibly impressive. I've been working heavily with Qwen3-TTS (1.7B) recently and actually had to write custom Triton kernels just to get the inference latency down to a manageable level for bulk candidate generation.

I'm really curious: how does the inference speed of these <25MB models look on consumer GPUs? Also, are these models deterministic, or do they have a stochastic nature where you need to generate multiple takes to get the best prosody?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: