While at Delphix, we did a lot of storage benchmarking. The I/O response times of Delphix depends, as one would logically imagine, heavily on the underlying disks. Sure Delphix can cache a lot ( with 1 TB of ram and 3x compression that’s 3TB and that 3TB can be shared by 10 or a 100 copies being the equivalent to 30TB or 300TB of databases) but really there will always be important I/O coming from the storage subsystem.
Now Delphix mainly runs databases loads, so the best test for storage that is hooked up to Delphix is to benchmark the storage I/O for a database workload. Two questions arise
What tool can benchmark the I/O?
What is a typical database I/O workload?
For the tool, the clear answer seems to be fio which not only is quite flexible but has an active community and updates and was started by the renown Linux I/O master Jens Axboe and still is actively maintained by him.
Now for database workloads there are 4 types of I/O
Random single block reads (typically lookups by index)
Large multi block reads (typically full table scans)
Small sequential I/Os to transaction log files
Large sequential I/Os totransaction log files (generally when commits are infrequent and change rate is high)
Now in a database, all of these can be done concurrently by multiple processes concurrently and thus we need to benchmark for concurrency as well.
Thus the matrix is the 4 types of I/Os to be tested by different number of concurrent users.
Now fio doesn’t come with such prepackaged I/O benchmarks, so I created a script fio.sh to run configurable database I/O benchmarks.
The code and examples are on github at
Prerequisite is to have a binary of the fio command ready.
Download the source at
and compile it.
Here is an example running the script (-b followed by full path to fio binary , -w followed by the directory to create a temporary large file for I/O testing)
$ fio.sh -b `pwd`/fio.opensolaris -w /domain0/fiotest
configuration:
binary=/home/oracle/fiodir/fio.opensolaris
work directory=/domain0/fiotest
output directory=/home/oracle/fiodir
tests=readrand read write
direct=1
seconds=60
megabytes=65536
custom users=-1
custom blocksize=-1
recordsize =8k
filename (blank if multiple files)="filename=fiodata"
size per file of multiple files=""
proceed?
y
CREATE 65536 MB file /home/oracle/fiodir/workdir/fiodata
creating 10 MB seed file of random data
20480+0 records in
20480+0 records out
10485760 bytes (10 MB) copied, 1.17997 seconds, 8.9 MB/s
creating 65536 MB of random data on
............................................................. 64926 MB remaining 300 MB/s 216 seconds left
............................................................. 64316 MB remaining 600 MB/s 107 seconds left
............................................................. 63706 MB remaining 300 MB/s 212 seconds left
............................................................. 63096 MB remaining 300 MB/s 210 seconds left
............................................................. 62486 MB remaining 200 MB/s 312 seconds left
............................................................. 61876 MB remaining 100 MB/s 618 seconds left
............................................................. 61266 MB remaining 35 MB/s 1750 seconds left
............................................................. 60656 MB remaining 300 MB/s 202 seconds left
............................................................. 60046 MB remaining 150 MB/s 400 seconds left
............................................................. 59436 MB remaining 75 MB/s 792 seconds left
............................................................. 58826 MB remaining 75 MB/s 784 seconds left
............................................................. 58216 MB remaining 85 MB/s 684 seconds left
............................................................. 57606 MB remaining 75 MB/s 768 seconds left
............................................................. 56996 MB remaining 75 MB/s 759 seconds left
............................................................. 56386 MB remaining 85 MB/s 663 seconds left
(more output)
test users size MB ms IOPS 50us 1ms 4ms 10ms 20ms 50ms .1s 1s 2s 2s+
read 1 8K r 28.299 0.271 3622 99 0 0 0
read 1 32K r 56.731 0.546 1815 97 1 1 0 0 0
read 1 128K r 78.634 1.585 629 26 68 3 1 0 0
read 1 1M r 91.763 10.890 91 14 61 14 8 0 0
read 8 1M r 50.784 156.160 50 3 25 31 38 2
read 16 1M r 52.895 296.290 52 2 24 23 38 11
read 32 1M r 55.120 551.610 55 0 13 20 34 30
read 64 1M r 58.072 1051.970 58 3 6 23 66 0
randread 1 8K r 0.176 44.370 22 0 1 5 2 15 42 20 10
randread 8 8K r 2.763 22.558 353 0 2 27 30 30 6 1
randread 16 8K r 3.284 37.708 420 0 2 23 28 27 11 6
randread 32 8K r 3.393 73.070 434 1 20 24 25 12 15
randread 64 8K r 3.734 131.950 478 1 17 16 18 11 33
write 1 1K w 2.588 0.373 2650 98 1 0 0 0
write 1 8K w 26.713 0.289 3419 99 0 0 0 0
write 1 128K w 11.952 10.451 95 52 12 16 7 10 0 0 0
write 4 1K w 6.684 0.581 6844 90 9 0 0 0 0
write 4 8K w 15.513 2.003 1985 68 18 10 1 0 0 0
write 4 128K w 34.005 14.647 272 0 34 13 25 22 3 0
write 16 1K w 7.939 1.711 8130 45 52 0 0 0 0 0 0
write 16 8K w 10.235 12.177 1310 5 42 27 15 5 2 0 0
write 16 128K w 13.212 150.080 105 0 0 3 10 55 26 0 2
What we see is
test – the test we are running either randread, write or read
users – number of concurrent users
size – size of I/O requests. Databases typically request 8kb at a time
MB – MB per second throughput
ms – average latency
min – min latency(not shown here)
max – max latency (not shown here)
std – standard deviation on latency (not shown here)
IOPS – I/O operations per second
50us 1ms 4ms 10ms 20ms 50ms .1s 1s 2s 2s+ – histogram of number of I/Os faster than heading value
This can be useful to just run on your laptop.
This summer I bought a used Mac laptop that had something called Hybrid SSD. I had been using a Mac with preinstalled SSD disks and thought the Hybrid would be similar response wise, but once I started using it, there was something cleary wrong, but before sending it back I wanted some empirical proof, so I ran fio.sh.
Here is the comparison
SSD - came with the Mac
test users size MB ms min max std IOPS
randread 1 8K 32.684 0.234 0.002 9.393 0.144 4183,
randread 8 8K 240.703 0.257 0.001 2.516 0.137 30810,
randread 16 8K 372.503 0.333 0.001 1.994 0.185 47680,
randread 32 8K 478.863 0.520 0.001 5.281 0.294 61294,
randread 64 8K 476.948 1.045 0.001 11.564 0.582 61049,
SSHD - hybrid SSD installed after market
test users size MB ms min max std IOPS
randread 1 8K 0.533 14.608 0.005 138.783 8.989 68,
randread 8 8K 0.767 80.769 0.035 256.965 53.891 98,
randread 16 8K 0.801 152.982 0.012 331.538 63.256 102,
randread 32 8K 0.810 298.122 0.015 519.073 79.781 103,
randread 64 8K 0.796 590.696 0.030 808.146 143.490 101,
(full list of SSD vs HSSD on my Macs at https://github.com/khailey/fio_scripts/blob/master/macssd)
The hybrid is atrocious compared to the SSD.
The random read is 14.6 ms which is the speed of a slow HDD. A 7K RPM HDD should respond at under 10ms. A 15K RPM HDD should respond at around 6ms. An SSD on a 2 year old Mac responds at 0.23 ms.
Its nice to just have an easy to run script to test out storage. Here is my linux box
test users size MB ms min max std IOPS
randread 1 8K r 14.417 0.517 0.005 8.922 0.382 1845
randread 8 8K r 26.497 2.355 0.004 12.668 0.790 3391
randread 16 8K r 24.631 5.069 0.004 15.168 1.080 3152
randread 32 8K r 24.726 10.101 0.005 32.042 2.124 3164
randread 64 8K r 24.899 20.051 0.005 37.782 4.171 3187
On my Linux desktop you can see how the MB/sec throughput maxes out about 26 MB/sec and after that latency just goes down proportionally as we add more concurrency.
The github repository all has r scripts to visualize the data (see the readme in github for details on how to generate the graphics)
Here is an explanation of the graphics.
There are a number of factors that are important when benchmarking I/O such as whether using Direct I/O or not, what the size of caching is on the host running fio, what the back end storage cache size is, what the size is of the file used to test I/O, how that file is initialized other with 0’s, or patterned data, or random data, whether the file system compresses or not, etc. Check out this blog post for some anomalies and surprises: http://datavirtualizer.com/lies-damned-lies-and-io-statistics/
Comments