Meltdown & Spectre test results and lessons learned (April 2018)
We started off the year with news about new vulnerabilities which were quickly met with patches, a.k.a. mitigations. Initial reports cautioned of the potential performance impacts, but we find actual numbers to be more interesting and helpful indications of what to expect.
With the release of Meltdown mitigations Login VSI testing showed between a 0.5% and 14% impact. Depending on which Windows operating system you are running, this isn’t so bad is it? Testing for Spectre on the other hand has taken some time due to the amount of effort required for these patches to be delivered by OS, Hypervisor and Platform vendors. Now that these are starting to be released we are seeing a much higher impact to performance of Server Based Compute and VDI solutions. Looking forward we can expect this to get worse before it gets better.
In the first quarter of 2018, the company Login VSI offered a free trial, called ‘Meltdown & Spectre Emergency Edition’ of Login VSI for Load Testing, in exchange for some evidence of the performance impacts. Many of the trial customers tested with their OS patches for Meltdown, and saw a manageable impact on performance but were unaware that they should expect more potential performance impacts with their application, hypervisor and hardware patches.
Some of our customers who regularly use Login VSI to test the performance impact of patches have reported that a totally patched system shows up to a 40% impact. For example, a customer hosting 100 sessions per server are now delivering 60 sessions with a comparable experience, before running out of compute resources. Had they not tested this would have meant a pretty significant increase in help desk tickets or even worse, productivity loss.
There are other customers who did not show quite as high of a performance impact, which leads us to believe, well… that it depends…
Let’s look at some of the trends we are seeing going on in our market.
4 GENERAL FACTORS, PERFORMANCE PILLARS
When we look at all the patches and their associated impacts there are many factors to note, however the four most influential factors are OS, Processor, Deployment Type and Applications.
THE BAD NEWS IS THAT THERE ARE SOME COMBINATIONS OF OS, HYPERVISOR AND HARDWARE THAT DO DEMONSTRATE A BIG IMPACT
THE GOOD NEWS IS THAT THERE ARE SOME COMBINATIONS OF OS, HYPERVISOR AND HARDWARE THAT DO NOT DEMONSTRATE A BIG IMPACT
You may be scratching your head after those statements, so let’s dive in and look at the trends as we’ve observed so far.
The Operating System
Observation 1: Older client and server operating systems will exhibit more of an impact than their newer versions.
The more an operating system must transition between User Mode and Kernel (Privileged) Mode, the higher the performance impact will be. With more current operating systems more will happen in User Mode, requiring fewer transitions and therefore less performance impact.
Observation 2: Older processors will exhibit more of an impact than new processors.
Newer processors have beneficial instructions that allow the OS to take advantage of the processor’s performance functions.
The Deployment Type
Observation 3: Session based workloads exhibit more of an impact than Desktop based workloads.
Because of the nature of roles, server operating systems do more in the privileged kernel space… so when we use the remote desktop services for published applications or published desktops, we require the server OS to do more transitions between User Mode and Kernel Mode. Client operating systems like Windows 7 and Windows 10 do more in the User Mode and therefore require fewer transitions.
Observation 4: I/O intensive applications may experience more of an impact than others
When it comes to performance, applications can be a bit of a wildcard. Because there are so many different types of applications, performance testing become a vital part of knowing what to expect. Like operating systems, applications can cause a lot of User Mode and Kernel Mode transitions, which will only add to the performance impacts you expect. An example of an application function that can cause a high degree of user mode to kernel mode transitions would be reads and writes to storage. Storage I/O intensive applications that transition more will have more of a performance penalty.
What to expect from patches
Let’s look at the different patches and how much of an impact they carry. It is important to note that there are more than a few vulnerabilities in the same class as Meltdown and Spectre. Some of these patches will be simple operating system patches, and other will be more complex, relying on multiple patches in the performance stack to be implemented. For example, Spectre will have patches on the operating system layer and in the server hardware layer (processor microcode). By themselves the Spectre patches may not have a great impact, but collectively they pack a punch.
For more information about the status of patches for many operating systems, hypervisors, applications, and hardware vendors we’ve found the following website to be helpful: https://meltdownattack.com/
Looking at performance from the Login VSI perspective
Login VSI measures performance in different ways. First there is a measure of scalability and efficiency, which is called VSImax. This measures the number of sessions a server can host before user experience becomes unacceptable. Another measure is called VSIbase and it measures the optimal responsiveness of the user experience, or best-case performance for a given session. We’ll look at how the patches impact performance from these two lenses.
These were some of the earliest patches to be released. Our initial testing showed the following:
*As you can see, more modern Operating Systems carried less of a performance impact when the Meltdown patches were applied.
There has been a lot of change going on with these patches and we expect these to continue to change as the vulnerabilities are more completely addressed and as the vendors involved look to optimize the performance of the patches. As expected initial performance results are quite alarming. Let’s look at some of the results we have seen in the field. Note that these performance numbers do not include the impact of patched applications.
NOTE: The most important takeaway… when you consider the 4 factors and apply patches to all layers of the performance stack, you’ll land somewhere in this best case, worst case spectrum. Be aware that a fully patched system, as an example worst case scenario, could set you back 40% in density. Based on that you can decide what to do next.
GOOD NEWS! You can get some of your performance back
One key item that is important to note is that while you may be losing considerable performance, it is possible to gain some of that performance back by optimizing the server OS and client OS. The VMware OSOT and Citrix Optimizer are two great tools that will help you to find the best optimizations for performance. In some cases, we’ve seen an unoptimized system recover all of the performance lost by the Meltdown and Spectre Patches.
For more information on VMware OSOT: https://www.loginvsi.com/blog/846-optimizing-desktop-images-with-vmware-osot
For more information on Citrix Optimizer: https://www.loginvsi.com/blog/843-optimizing-desktop-images-with-the-citrix-optimizer
How Login VSI can help:
Of course, Login VSI is here to help you determine the impact of these mitigations. We’ve already helped hundreds of enterprises around the world find their best performance, and this is no exception.
Also note that our Professional Services team is here to help you get up and running as fast as possible, so you don’t have to wait to find out how much the Spectre and Meltdown patches are impacting you.