I think personally the unit you measure divergence in just doesn’t matter. Yes, nats is technically superior, but as long as you do it consistently, all that you really want to do is to measure how similar A is to B.
In that sense I think many explanations of KL are very convoluted.