Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: