fio's histogram output mode records per-I/O latency distributions rather than just aggregate statistics. This lets you compute accurate percentiles from the raw data — essential for the measurement methodology described in the cloud latency article. The steps below build fio from source with an extended maximum latency range (the default caps out at ~16 seconds; we push it to ~9 minutes for long-tail measurements), then run it against a CEPH/RBD cluster and process the output.
Dependencies
dnf update && dnf install -y git python numpy python2-pandas gcc librbd1-devel
Build fio from source
The histogram latency range is compiled in, so we need to patch stat.h
before building:
git clone https://github.com/axboe/fio.git && cd fio
# Increase max latency to ~9 minutes (default cap is 16 seconds):
sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19$/#define FIO_IO_U_PLAT_GROUP_NR 24/' stat.h
make fio
cd tools/hist
Run against a CEPH/RBD cluster
The key flags for histogram output are --write_hist_log,
--log_hist_msec, and
--log_hist_coarseness. The example below
uses CBT-style parameters:
../../fio --ioengine=rbd --clientname=admin \
--pool=cbt-librbdfio --rbdname=cbt-librbdfio-`hostname -s`-0 \
--invalidate=0 --rw=write --numjobs=1 --bs=4M --size=2048M \
--name=librbdfio-`hostname -s`-0 --runtime=60s \
--write_hist_log=z --log_hist_msec=1000 --log_hist_coarseness=3
Parse histogram logs
The resulting z_clat_hist.1.log file is processed
with fiologparser_hist.py, which ships with fio:
# Example log file from the cloud latency measurements:
# curl -L -O https://cronburg.com/fio/z_clat_hist.1.log
./fiologparser_hist.py z_clat_hist.1.log
The companion notebooks Weighted-Percentiles and Histogram-Accuracy walk through the statistical analysis of this output.