Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience if you're coding or doing something that requires precision, quantizing the kv cache is definitely not worth it.

If you're just chatting or doing less precise things it's 1000% worth it going down to Q8 or sometimes even Q4

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: