Best Practices in Virtualization Testing: Small-scale vs. full-scale testing
Deploying and maintaining VDI is a challenge for every company. There are so many decisions to make and the speed of change seems to be accelerating. Because we are all constantly trying to make the best use of constrained resources, a question we get regularly here at Login VSI has to do with scaling tests. It goes something like: "If I want to find out the number of servers I need, I simply test how many users I can put on one server then multiply by the number of users I need to support. Right?"
My answer to this is a definite "Njet, no way, negative." Your complete environment responds fundamentally differently than when you test on just one or two machines. The most obvious reason for this is your storage system. It’s very likely that you are using shared storage. Meaning, at some point you will be out of storage resources. And without storage optimization solutions, this might be sooner than you like.
People new to VDI might think that performance scales linearly with the number of servers. While this is typically true for the first 2 or 3 servers, after that scaling quickly becomes non-linear. The graph below illustrates this point. The blue line shows the expected number of desktops on the total number of servers. While performance of the desktops is fine at 400 users, we can clearly see that at 500 users simply adding an extra server won't satisfy our needs.
Beyond the question of storage, in complex environments there are more commonly shared components. Examples are:
- Broker / Load balancers
- Backend databases
- Application servers or application virtualization techniques
The only way you can truly validate that all of the above are working correctly together is by testing at full scale. These are some of the questions you might have that testing at capacity can help answer:
- Is my environment stable and performing at the desired capacity
- Are the backend systems handing the load?
- Can my environment handle logon-storms?
- What happens if one of my datacenters fails?
- What happens if one of my servers (or racks) fails?
Through the years, we have seen plenty of examples where the environments that our customers designed looked great, but in practice and when testing at full scale they just didn’t work. For example, I witnessed an environment where too much power was being pulled from the grid during a Login VSI test. This caused the VDI Proof of Concept rack to be disconnected from power. While this was not a problem because it happened in a test environment, imagine the panic if this were to be discovered in production with thousands of real users.
A separate example was at another company where the storage device was misconfigured to only use a limited amount of CPU. Initial tests with few users indicated that there wasn't a problem. But as soon as we started to ramp up the tests to about 400 or 500 users, the environment came to a complete halt. After identifying and resolving the issue, we were able to host the desired number of users on the environment and move to the pilot phase.
My advice: When you start to do performance tests, of course it’s a good idea to start with just one machine. Make sure this works and you get the best performance out of it. Right after this initial test, start increasing the test size to full-scale. The same goes for environments that are already in production. At some point you are likely to make a change. When it comes to any of the components mentioned above, I highly recommended to test at scale so you know exactly what level of performance to expect.