These sizes are incredibly impressive. I've been working heavily with Qwen3-TTS ...

These sizes are incredibly impressive. I've been working heavily with Qwen3-TTS (1.7B) recently and actually had to write custom Triton kernels just to get the inference latency down to a manageable level for bulk candidate generation.

I'm really curious: how does the inference speed of these <25MB models look on consumer GPUs? Also, are these models deterministic, or do they have a stochastic nature where you need to generate multiple takes to get the best prosody?