I’d contend that you’re optimizing for initial deployment speed rather than over...

I’d contend that you’re optimizing for initial deployment speed rather than overall resilience. Backing off with increasing delays before retrying a dependent service call is a best practice for production services. The fact that you’re seeing this behavior on initial rollout is inconvenient, but it’s also self healing. It might take a bit longer than you like, but if you’re really that impatient, there are workarounds like the one you described.