Essential VDI Performance Graphs in the Login VSI Analyzer
Every user of Login VSI knows the “famous” VSImax chart in the Login VSI Analyzer. This chart is used in almost every VDI white paper of the major software / hardware vendors in our industry and basically shows the maximum capacity of your virtualized desktop environment. Besides VSImax, I frequently notice that many of our customers are not very familiar with the other charts in the Login VSI Analyzer. In this blog I would like to share some charts and features of the Login VSI Analyzer that I frequently use after a performance test.
VSImax v4 Detailed Weighted
The VSImax Graph is calculated based on several metrics. To better understand how the individual metrics influence the overall VSImax result, you will need to take a look at the weighted view. In this graph you will see how each individual metric influences the VSImax score in comparison to the other metrics. For those of you that want to read the full story on VSImax calculation, read the blog by my colleague Jordi with a detailed overview of all the calculations.
In the graph below its easily noticeable that the NSLD (Notepad Start Load) was spiking much more in comparison to the other metrics. This is a good starting point to dig deeper into the bottleneck. The next step is to find out why this NSLD metric is so high. Other graphs and external data can help you with this.
VSImax v4 detailed weighted graph
The VSImax detailed graph also contains two specific metrics that cannot be found in any other tab: “File Copy To Server” (FCTS) and “File Copy To Local” (FCTL). If the FCTS score is spiking, this could be an indicator that the network is your bottleneck, while FCTL could be an indicator that your local disk is the problem.
Tip: Use the "Select Averages” option by clicking the metric selection overview with your right mouse button. This will remove the maximum and minimum lines and this makes it much easier to see the impact of the different metrics.
In the main and detailed VSImax charts you only look at the minimum, maximum and average test results. In the scatter chart you can actually see the measurements of all the individual sessions. The scatter chart is very interesting and makes it easy to spot outliers. For example, when a single VM, especially at the beginning of the test, has very high response times while the rest has a lower score, it’s a great indicator that there was a performance problem with an individual session or VM. The scatter chart is also very useful to determine if you need another VSIshare.
NFO is the file open dialog, so how long does it take to display the file open dialog. This is a really interesting one because if you would redirect the home directory to an H: drive, and you see a huge increase in this graph and all the others basically stay the same, it probably means that you have a problem with the location where the H: drive is hosted.
Login VSI calculates the CPU performance by generating a large array of random data and spiking the VM’s CPU for a short period of time. The great thing about this Login VSI activity, is that it’s 100% CPU based. There is no IO required for this measurement, so this is the purest CPU measurement that Login VSI does. If this graph goes up, you probably know that you have a CPU bottleneck, so this graph definitely helps you to isolate where the performance problem is coming from.
If it’s not CPU and we want to take a look at storage, we can go to the IO tab in the Analyzer. The IO metric is actually not used for the VSImax calculation but we do measure it by writing a couple of random blocks to the %temp% directory of the Login VSI users. This IO measurement is a great way to determine, without external performance data, if the storage that hosts the %temp% directory is the bottleneck.
But the increasing latency of the graph doesn’t tell the complete story of Storage IO vs CPU, because storage latency can be affected by the CPU. If the CPU chart is not increasing but IO latency is dramatically increasing during the test, then you probably have a storage bottleneck. If you see that CPU increases, a logical effect is that IO latency also increases (but it will be much less erratic), which means it’s probably a CPU bottleneck.
In short: if CPU and IO latency both go up, then it’s probably a CPU bottleneck. If only IO latency goes up, it’s most likely a storage bottleneck.
NOTE: The IO score can be impacted by settings in the image. Think for example about your AV solution that is scanning all the files.
Another interesting graph in the Login VSI Analyzer is AppStart. This graph shows the start times of all the applications during a Login VSI test. You can use the AppStart graph to better understand application launches and how this is being influenced by different settings. It’s important to note that AppStart is not a very good indicator to show system saturation (like VSImax) because many times we see that application start times are fast while overall responsiveness of the VM was super sluggish.
It’s also interesting to note that you will usually see a high latency value and low latency value for each application. A high latency value is typically seen the first time the application was started within the VM. Once an application has been started and you close it, it will be cached to memory on the Windows VM. The second time an application is started, the latency is much lower. It’s totally normal to have such high differences for the same application. On your own laptop or pc you will see the same behavior. The second time, you launch an app, it’s much quicker.
The final graph I use on a regular basis is “LogonTimer”. This graph basically shows how long it took from the start of Login VSI’s logon script until the first application has started. In the example screenshot below you can see that the logon time is pretty evenly spread out, as the workload grows, and evenly ranges from 8 to 16 seconds. In typical enterprise environments you will see that login time is not that quick. In the real world I’ve seen a lot of cases where 3 minutes was normal. This is not very user-friendly so you can use the LogonTimer graph to tune and tweak your environment policy settings to see if you can lower your logon time.
If you really want to deep-dive into where the performance bottleneck is coming from, you can also import external performance data from the host level (e.g. VMware ESXtop, XenServer or Microsoft Perfmon) data in the Login VSI Analyzer.
In the screenshot below I imported the host total CPU utilization time from ESXtop into the Analyzer. The orange line clearly shows that CPU increases over time to 100% (see the right axis), and that VSImax was hit well before that 100% CPU number. Typically, VSImax is already hit when the CPU is at 80% to 90% VM utilization.
External data in the Login VSI Analyzer
You can import your own external data in the Analyzer by going to File > Import and by clicking on External Data. You can use any comma separated text file (CSV) as long as it contains a time stamp and a value.
Below you will find some useful ESXtop metrics (Perfmon and XenServer have similar metrics):
- Avg % Util
- Avg % Processor Time
- Avg IO Commands/sec
- Avg IO Reads/sec
- Avg IO/Writes sec
- Avg MBytes Read/sec
- Avg MBytes Write/sec
- Avg Guest Latency
- Network Related
NOTE: When using Microsoft Perfmon don’t use the metrics of the host partition, make sure you choose the hypervisor metrics.
Ready to start analyzing some test results?
There is probably a lot more data that you can get out of the Login VSI Analyzer but these are the graphs that I use most often when I analyze results from VDI performance tests for our customers. I hope that this information will help you to better understand your virtualized desktop environment. If you need any help, I am more than happy to help. You can reach me on twitter via @jaspergeelen or drop me an email at firstname.lastname@example.org
This blog by my colleague Mark about lesser known tips for your VDI performance tests might also be interesting for you.
Happy VDI performance testing!