Fluctuation in VDI Test Results you can solve with Login VSI
As a test engineer at Login VSI, I perform many tests each day. But to give some scientific value to these test results, I usually repeat these tests at least 10 times to make sure that there is not a lot of variance between tests with the exact same settings. Some of our customers seem to struggle with getting consistent results in their own environment and that’s why they frequently ask me “How is it possible to get a different result when repeating a Login VSI test with the same identical settings?” In this blog, I will explain the most common reasons why this is happening and I will share some best practices on how to solve this.
VDI is complex
First, it’s important to realize that VDI is complex. VDI is basically a big stack of many moving components with a very high utilization. You can compare it to a high way where all the cars on the road only have little room between each car. The highway is at maximum capacity and one day it’s fine because no one is breaking but it only takes one person to brake unexpectedly because that person saw a beautiful butterfly, and this little break activity in a very highly utilized highway can cause major havoc. This same butterfly effect applies to VDI environments, only a small change can cause a VDI traffic jam and different test results with Login VSI.
The VDI stack consists of many different components
Some common things that have an impact are hardware and software changes, antivirus scans, backup jobs or scheduled tasks. It’s also important to remember that the VDI workload is truly unique to each user. No other workload that runs in the datacenter comes close to VDI workloads. VDI has tens of thousands of processes running on hundreds of VMs with a much higher utilization (of resources from the infrastructure). Also, the IO (footprint) pattern is much different than typical applications. Server application back ends are usually steady compared to VDI.
So, you can imagine that when VDI is so dynamic and so intense, one time you get very good performance and it takes much longer to hit VSImax than the other time. So, you will need to address these differences as much as possible for reliable and consistent test results. Here are four commons best practices to solve these differences:
Dynamic Resource Scheduling (DRS)
DRS is usually enabled in some environments where a specific host is very heavily utilized and VMs request a lot of vCPU, memory or other resources. With DRS, VMs can be dynamically assigned to other resources (e.g. with vMotion). DRS sounds great but is not very useful in combination with Login VSI. Of course, there are ways you can configure DRS but if you configure it incorrectly you can start a test with e.g. one server with only sixty users and the other one with ninety users, and the next test it’s completely different. When Login VSI puts stress on a system that wants to balance load and move VMs back and forth between servers, DRS will go crazy because we stress test all individual machines and our simulated users act like real users. Sometimes they are busy, sometimes they are slow and DRS just thinks: “Hey this machine is slow, I will quickly move it there with other slow machines” but a short while later, the user becomes very active because it starts watching a video. The best way to get consistent test results is to completely disable DRS, but if you really want to frustrate DRS, Login VSI is your #1 tool ;-).
Starting too many sessions
To get more stable results with Login VSI, we really recommend that you don’t start your tests with too many sessions. For example: if you start a test with 200 users, and you hit VSImax at 100, we really recommend that you run the test again with 120 or 130 users which is much closer to the VSImax limit. Why? Because you will see that the overhead of all the VMs that tried to handle 200 sessions will also positively impact your actual VSImax. So, if you lower your test from 200 simulates sessions to only 130, your VSimax will probably also go up from 100 to 120. This means that if you would do this with actual users going from 200 to 130, they would also be a lot happier so that’s the density you most likely would want to shoot for. I know it’s a bit more expensive but there is always this discussion between costs and performance and Login VSI really tries to help you visualize this difference.
Simulated users by Login VSI
Overcommitting is great for scenarios where you have a high peak of utilization but you just don’t have enough memory. But overcommitting has a very high impact from a memory perspective and severely overcommitting the system 10% to 20% is something that you should always prefer to avoid. It’s OK if you do it once per week or on an occasion when everybody is working at the same time, but if you are running in regular production hours, heavily overcommitted on VDI, it can cause a lot of performance fluctuations. First day it’s great, second day your end-users will start complaining dramatically. It’s just not a best practice and that’s why we also recommend to disable this during your Login VSI tests for consistent results.
Rebooting should be an essential part of every performance tests. If you want to do scientific performance testing with valid and consistent results, you have to take the same steps every time and a reboot should be one of these steps. What do you usually need to reboot? Just your VMware host, your Hyper-V host and of course your Login VSI launchers. Your launchers are typically on a separate infrastructure but by doing so you make sure that everything stays the same as the first test and everything is reset as well. It’s also important to give your machines some time (cooldown) to get back online before you start another test. On average VMware takes about 10 to 20 minutes before all desktops, after the last one is started, are stabilized and back to normal. Within this timeframe it will e.g. divide the memory, do some optimization etcetera. But rebooting is very important, if you do not do it, you might see fluctuations in the results.
low variance = high quality test
When you repeat a Login VSI test and you see that your test results are pretty consistent and within a 5% variance between each test and you repeated the test approximately 10 times, it means that your performance is pretty good and you don’t have to worry. If you get a large variation in test results, 10% or higher, it tells you that you probably have not enough resources or you launched too many sessions, and this is something to look into. By lowering the variance in your Login VSI test results, you are actually improving the overall performance reliability of your environment.
I hope that you enjoyed reading this blog. If you have any questions about testing with Login VSI, please do not hesitate to drop me an email at firstname.lastname@example.org