Achieving Rapid Response Times in Large Scale Systems
[Editor’s note: I know I am writing a blog after ages. Hoping to be more regular going forward and maintain a regular schedule]
Recently, I came across this slide deck from Google’s Jeff Dean about achieving rapid response times in large scale systems. The overall summary of the talk is to describe various load management and balancing techniques Google uses to decrease their response times both in average and high percentile.
Some salient points of this talk:
– Increased fan out (aka larger scale) makes it more challenging to keep the latencies low under high percentiles as more things can fail or be slow. Example: You are trying to serve a request by querying across multiple partitions and you are as fast as the slowest partition. This becomes a serious problem in the high percentile (99th or 99.9th).
What intrigued me about this talk and its conclusion which I will get to below is the emphasis on optimizing for TP99 latency. As we called out in the original Amazon dynamo paper, engineering systems to optimize for TP99 latency is hard and require careful optimization at various level such as good engineering practices (prioritized queues, careful management of background queues). Here Jeff’s talks about similar practices and a few more techniques they used to address large scale variability:
- General good engineering practice to build individual components right (priority queues, better management of background tasks)
- Better load management at the server/service side: by spreading the load across servers, overload protection and migration of hot spots
- Better load balancing at the client side by: sending backup requests to another replica dynamically when you find the original replica is slow (interesting question is when to trigger the backup request but one can imagine picking a reasonable value like TP50), better cancellation schemes of backup requests, and finally giving degraded results.
- Mentioned that backup requests beyond 2 provide diminished returns. I think this totally makes sense and probably due to the power of two choices.
Overall, excited to see the great talk.