Abstract
The industry has been using average latency measurements for storage systems for decades, mostly because it is easy to measure. However, modern storage systems have many levels of caching and clever optimization that impact latency for better and worse. Average latency becomes useless in these systems because of the wide variance, often several orders of magnitude. High dynamic range (HDR) histograms are a good solution to this problem by capturing detailed latency information for analysis and overall performance improvement.
Existing applications and operating systems rarely record histograms. For these, black box tracing can be used to measure the time between events or function calls. The data can be recorded into time-series databases for analysis, correlation, and trending.
Unfortunately, tracing can suffer from probing effects that impact CPU utilization. The best new systems include HDR histograms in their internal measurements for efficiency. We discuss how to implement such measurements while maintaining high efficiency.