Running fio with histogram output

fio's histogram output mode records per-I/O latency distributions rather than just aggregate statistics. This lets you compute accurate percentiles from the raw data — essential for the measurement methodology described in the cloud latency article. The steps below build fio from source with an extended maximum latency range (the default caps out at ~16 seconds; we push it to ~9 minutes for long-tail measurements), then run it against a CEPH/RBD cluster and process the output.

Dependencies

dnf update && dnf install -y git python numpy python2-pandas gcc librbd1-devel

Build fio from source

The histogram latency range is compiled in, so we need to patch stat.h before building:

git clone https://github.com/axboe/fio.git && cd fio

# Increase max latency to ~9 minutes (default cap is 16 seconds):
sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19$/#define FIO_IO_U_PLAT_GROUP_NR 24/' stat.h

make fio
cd tools/hist

Run against a CEPH/RBD cluster

The key flags for histogram output are --write_hist_log, --log_hist_msec, and --log_hist_coarseness. The example below uses CBT-style parameters:

../../fio --ioengine=rbd --clientname=admin \
  --pool=cbt-librbdfio --rbdname=cbt-librbdfio-`hostname -s`-0 \
  --invalidate=0 --rw=write --numjobs=1 --bs=4M --size=2048M \
  --name=librbdfio-`hostname -s`-0 --runtime=60s \
  --write_hist_log=z --log_hist_msec=1000 --log_hist_coarseness=3

Parse histogram logs

The resulting z_clat_hist.1.log file is processed with fiologparser_hist.py, which ships with fio:

# Example log file from the cloud latency measurements:
# curl -L -O https://cronburg.com/fio/z_clat_hist.1.log
./fiologparser_hist.py z_clat_hist.1.log

The companion notebooks Weighted-Percentiles and Histogram-Accuracy walk through the statistical analysis of this output.