Troubleshooting Xen Virtual Machine Disk IO Over-Utilisation on the Hyper-Visor
In the world of hosted virtualisation environments, disk misuse (or overuse) will be an all too common issue that you can face day to day. Often you may be left guessing as to which of the many VM’s on a Hypervisor is responsible.
We’ve compiled a few of the simplest, and most direct ways of pinpointing exactly which pesky VM is the cause. The only thing you need to have installed? Sysstat.
Physical/Underlying Disk Over-Utilisation:
If you monitor your disk IO levels (which you should) you may be alerted to certain disks having critically high levels of input/output utilisation. If this is the case, use the following.
- -d – Show the disk report (excludes the CPU report)
- -x – Shows extended stats (the useful ones like %util and io queue size)
- -k – Displays values in kB/s rather than in blocks/s (easier to understand the output)
- -5 – Wait 5 seconds before re-polling iostat for new figures
- -3 – Poll iostat 3 times, and then average the results
This will give you an output something like what’s shown below:
As you can see, this node is running fine right now. In the far left column, you can see the device names, it lists the physical underlying disks as well as the virtual devices attached to the VM’s. The high IO culprit will be instantly visible, the disk causing the problems (often times in pairs, as each VM has an image and a swap), will usually have very close to 100% “&util”, and the various read and write columns will have tell-tale high numbers being reported. Use the above as a reference of a healthy, functioning, production level Hypervisor.
But wait, I hear you shouting… What good is a dm-x virtual device name? How can I resolve that to the VM name/number? Good question, see below for the next command to make use of.
I won’t dissect this fully, as it is a heavily awk’d lvdisplay. But in brief, it uses lvdispaly, which contains all of the information you need, but picks out the important information. It starts by pulling the 3rd value of the “LV Name” line, which is the VM’s logical volume name, this includes the VM ID which you can use to locate the VM later. It then pulls the block device number from the 3rd value of the “Block Device” line and appends that to a piece of text “dm-“ to make it a bit more readable.
The output will be something like this:
It is now very easy to tie together the suspect dm-x device you found earlier, to a much more useful VM ID. If you want a quick fix, issue: