All of my I/O testing and benchmarking has been geared toward Oracle, so the natural choice would for benchmarking I/O would be orion, the I/O benchmarking tool from Oracle. Unfortunately Orion has had some problems that have made it too undependable for me to trust. First there have been annoying problems getting it running especially on NFS. The problems are easy to resolve once the solutions are known. See previous blog on Orion errors.
Orion has a much more serious issue, at least in some cases. In some cases Orion re-reads the same blocks covering a much smaller data set size than requested.The following strange behavior is with orion on X86 Solaris. The orion binary was from an 11g distribution. The root of the strange behavior is that orion seems to revisit the same blocks over and over when doing it’s random read testing.
A dtrace script was used to trace which blocks orion was reading. The blocks in the test were on /domain.
#!/usr/sbin/dtrace -s
#pragma D option quiet
::zfs_read:entry
/ strstr((args[0])->v_path, "/domain") != NULL /
{ printf("%lld\n", args[1]->_uio_offset._f); }
Steps:
Created a 96GB file and put it’s path in mytest.lun
Modified io.d to filter for /domain .
Ensure no non-orion I/O is going to the filesystem.
Start running io.d > blocks-read.txt
Kicked off orion with:
export LD_LIBRARY_PATH=.
./orion -testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 \
-duration 60 -simulate raid0 -write 0 -num_large 0
-run advanced : users can specify customizations -matrix row : only small random I/O -num_disks 5 : actual number of physical disks in test. Used to generate a range of loads -cache_size 51200 : defines a warmup period -duration 60 : duration of each point -simulate raid0 : simulate striping across all the LUNs specified. There is only one LUN in this test -write 0 : percentage of I/O that is write, which is zero in this test -num_large 0 : maximum outstanding I/Os for large Random I/O. There is no large random I/O in this test.
Once the test is finished, stopped the dtrace script io.d .
—
Example output from a run
ORION VERSION 11.2.0.1.0
Command line:
-testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 -duration 60
-simulate raid0 -write 0 -num_large 0
These options enable these settings:
Test: mytest
Small IO size: 8 KB
Large IO size: 1024 KB
IO types: small random IOs, large random IOs
Sequential stream pattern: RAID-0 striping for all streams
Writes: 0%
Cache size: 51200 MB
Duration for each data point: 60 seconds
Small Columns:, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
Large Columns:, 0
Total Data Points: 26
Name: /domain0/group0/external/lun96g Size: 103079215104
1 files found.
Maximum Small IOPS=62700 @ Small=16 and Large=0
Minimum Small Latency=81.81 usecs @ Small=2 and Large=0
Things look wrong right away.
The average latency is in 100s microseconds (above the fastest minute was average of 81us) over a file that is 96G which is twice as big as the cache of 48G.
The max throughput was 489MB/s
Total blocks read
# wc -l blocks-touched.txt
78954834 blocks-touched.txt
Unique blocks read
# sort blocks-touched.txt | uniq -c | sort -rn > block-count.txt
# wc -l block-count.txt
98305 block-count.txt
We only hit 98,305 unique offsets in the file yet a 96GB file has 12,582,912 unique 8k offsets.
The unique block hits totals around 768 MB of data which is easily cached.
Blocks access frequency
# tail block-count.txt
695 109297664
694 34532360192
693 76259328
693 34558271488
The least frequently hit blocks were hit almost 700 times and the average was over 800 yet there were 78,954,834 block access in a file of
12,582,912 unique blocks , so the average should have been about 6 hits per block.
This may be caused by having multiple steams starting from the beginning of the file or at the same “random” offset every test duration of 60 seconds. I’m not sure. If this is the case, the only work around would be to increase the duration to an amount of time that would insure kicking out most of the blocks from the beginning of the test. If each thread starts out at the same location and reads the same set of “random” blocks, then there is no workaround. Ideally I’d want each stream to be starting from a different random location and reading a different set of random blocks.
Comments