Tackling Storage Performance Issues in Virtualized Environments
November 14, 2024
An exclusive article by Brian Martynowicz, Leee Jeffries, and Nicholas Campa
Virtualization is a cornerstone technology that enables scalability, flexibility, and cost savings. However, as organizations scale their virtualized infrastructures, they often encounter performance bottlenecks that impact the end-user experience.
One such bottleneck that has recently come to the forefront is storage performance, particularly in relation to profile management solutions like FSLogix.
Over the past few months, I’ve observed a recurring issue across several large-scale enterprise accounts we work with. These organizations are experiencing degraded user experiences due to misconfigurations and storage performance issues with profile management systems. Many others will likely be if these enterprises face such challenges.
In collaboration with my colleague and long-time EUC community expert, Leee Jeffries, we aim to shed light on this issue and provide actionable insights to help you optimize your virtualized environments.
Understanding FSLogix and Profile Management
FSLogix is a profile container solution acquired by Microsoft that enhances profile management in virtualized environments. It redirects user profiles to a network location, allowing faster logins and consistent user experience across sessions. By encapsulating user profiles in VHDX (Virtual Hard Disk) files, FSLogix minimizes profile load times and reduces the complexity associated with roaming profiles.
However, the benefits of FSLogix can be negated by improper configurations and suboptimal storage performance. Since user profiles are stored on network storage rather than local disks, any latency or I/O bottlenecks in the storage subsystem can significantly impact the user experience. This is where many organizations falter, as they underestimate the importance of storage performance in their virtual desktop infrastructure (VDI) deployments.
Figure 1. A sample configuration of profile management via FSLogix.
The Impact of Storage Performance on User Experience
When an end-user reports that their virtual machine is “slow” or “sluggish,” pinpointing the root cause can be challenging. User feedback is inherently subjective and can vary based on individual perceptions and tolerance levels. In virtualized environments, even minor delays in accessing profile data can lead to noticeable performance degradation from the user’s perspective.
The latency introduced by accessing profile data over the network—especially when the storage is not optimized for the workload—can affect everyday operations like opening documents, loading applications, and interacting with the desktop environment. As enterprises scale up, these issues can become more pronounced, leading to widespread user dissatisfaction and reduced productivity.
Measuring User Experience: The EUX Score
To objectively assess and quantify the user experience, we leverage Login Enterprise’s EUX (End-User Experience) score. This metric provides a comprehensive view of system performance from the user’s perspective by measuring various key performance indicators (KPIs) that correlate with common user operations.
Determining the EUX Score
The EUX score is calculated based on a series of timers representing users’ actions, which help us measure metrics like: application responsiveness, keyboard input processing, CPU-intensive tasks, and storage I/O latency. The score also accounts for application and session failures, penalizing the overall score when such events occur.
Here are the core components of the EUX score:
- My Documents I/O Score: Measures disk read and write operations (mostly sequential) in the ‘My Documents’ folder with caching disabled. This assesses IOPS (Input/Output Operations Per Second) and latency.
- Local AppData I/O Score: Evaluates disk read and write operations (mostly random) in the ‘Local AppData’ folder with caching disabled, focusing on IOPS and latency.
- CPU Score: Performs a series of mixed CPU operations to determine how many can be completed within a fixed period.
- Mixed CPU I/O Score: Executes a mix of cached and non-cached operations, including compression and decompression tasks.
- Generic Application and User Input Score: Launches a proprietary, purpose-built text editor that simulates the behavior of applications like Microsoft Office. It measures the time from application start to readiness for user input and evaluates typing speed by measuring characters per second.
By analyzing these metrics, we can objectively determine where performance bottlenecks exist and how they impact the user experience.
The Evolving Bottleneck: From CPU and Memory to Storage Performance
Historically, performance bottlenecks in virtualized environments have shifted as technology and infrastructure components have evolved. In the past, CPU and memory were often the primary constraints. As hypervisors and virtualization technologies matured, optimizations in these areas reduced their impact on performance.
Today, we see a resurgence of storage performance as a critical bottleneck, mainly due to the increased reliance on profile management solutions like FSLogix. As user profiles and data grow and expand in complexity, the demand for storage subsystems intensifies. Network storage systems must handle high IOPS with low latency to prevent performance degradation.
This shift highlights the importance of continually monitoring and optimizing all components of the virtualized environment. As one bottleneck is addressed, another may emerge, necessitating a holistic approach to performance management.
Case Study: Mitigating Storage Bottlenecks in a Fortune 1000 Finance Company
Let’s explore a real-world example of how addressing storage performance can significantly improve the user experience—Shoutout to Nicholas Campa.
Background: We worked with a Fortune 1000 company designing its Citrix solution using Azure compute components. During the testing phase, they noticed that users were experiencing sluggish performance, particularly during login and when accessing applications.
Challenge: Initial assessments targeted storage performance issues related to their FSLogix profile containers. The storage subsystem was not provisioned to handle their user profiles’ IOPS and latency requirements.
Solution: By leveraging the EUX score and detailed metrics provided by Login Enterprise, we pinpointed the storage performance as the primary bottleneck. The organization upgraded its storage solution to a higher-performance option, offering better IOPS and lower latency. Additionally, they optimized their FSLogix configuration based on best practices.
Figure 2. Real results from our testing. Remember that individual miles may vary depending on your user workflows and technology combinations.
Outcome: The improvements significantly increased the EUX score, reflecting a smoother and more responsive user experience. The organization mitigated the storage bottleneck before moving into production, avoiding potential user dissatisfaction and productivity losses. They also calculated the associated cost of migrating to the new storage configuration on a per-user basis, which helped justify the investment to stakeholders.
Collaborating with Experts: Insights from Leee Jeffries
To delve deeper into the technical aspects and best practices for optimizing storage performance and FSLogix configurations, I’ve partnered with Leee Jeffries. You may know Leee for his extensive contributions and influence in the End-User Computing (EUC) community.
Figure 3. In addition to being a stellar technologist, he’s also quite nice.
User experience is the cornerstone of any End-User-Computing solution, irrespective of whether you are working on a physical machine or a virtual desktop; you must complete your tasks as efficiently and quickly as possible. A consultant is responsible for making the solutions provided to end users seamlessly merge into their day-to-day work.
The challenge has always been the way this is performed. FSLogix has been key for a long time, and Outlook caching has been the main driver for the wide adoption of VHD profile-based solutions. The emergence of Exchange Online triggered this adoption, and later, Microsoft decided to purchase FSLogix, an intelligent decision.
When implementing FSLogix on a standard file server such as a clustered file server, it’s important to track your disk queue length during user migrations; this will provide insight into how well those servers are coping with the storage requests being asked for them. If your disk queue is continuously high and above a value of 2, storage operations will be queued, impacting user profile performance and login times.
If you are deploying FSLogix on Microsoft Azure Files, there are a few things to remember. The total IOPS available for the FSLogix file share is based on the size of the share.
There is also a handle limit to be aware of. Each user of FSLogix will create two file handles.
I recommend the following settings to ensure you get the best out of your FSLogix deployment.
- Do not use differences disk for single session desktop; instead, set “ProfileType to 0.”
- Set “FlipFlopProfileDirectoryName” to 1 to make folders easier to read
- Avoid the use of Office containers unless you absolutely must split the two VHDX files
- Set the “DeleteLocalProfileWhenVHDShouldApply” to ensure any local profiles left over after image maintenance are overwritten.
- Set the “Locked VHD Retry Count” and “Volume re-attach retry count” to 10 to retry ten times.
- Set “Volume re-attach retry interval” and “Locked VHD retry interval” to 5. This will retry a VHD attach every 5 seconds
- Set the “milliseconds to wait for volume arrival” to 2000.
Disclaimer – These settings are based on my experience working in environments where storage can become strained, and you want to ensure FSLogix is properly configured to avoid end-user experience issues. Individual miles may vary, and these settings should be tested instead of blindly implemented.
Figure 4. Azure File Storage performance characteristics.
Recommendations and Next Steps
Given the critical impact of storage performance on the user experience in virtualized environments, we recommend the following actions:
- Assess Your Storage Performance: Use objective metrics like the EUX score to evaluate your current storage subsystem’s performance. Identify any bottlenecks that may be affecting user-profiles and application responsiveness.
- Review FSLogix Configurations: Ensure your FSLogix settings follow the industry’s best practices. Misconfigurations can exacerbate performance issues even on high-performance storage systems.
- Engage with Experts: Consult with professionals specializing in virtualization and storage optimization. Their expertise can help you navigate complex configurations and implement effective solutions.
- Continually Test and Monitor: Implement ongoing testing using tools like Login Enterprise to identify and address performance issues before they impact end-users proactively.
- Consider Cost-Benefit Analysis: When upgrading storage solutions, perform a cost-benefit analysis to understand the per-user costs and justify investments to stakeholders.
As virtualization technologies evolve, so do the challenges of delivering an optimal user experience. Storage performance, especially profile management solutions like FSLogix, has emerged as a critical factor that can significantly impact end-user satisfaction and productivity.
Organizations can mitigate these bottlenecks by objectively measuring performance using tools like the EUX score, addressing configuration issues, and investing in appropriate storage solutions. Proactive management and continual optimization are key to ensuring that your virtualized environments deliver the performance your users expect – just like Login Enterprise is designed to. Connect with our EUC experts to learn more.
Remember, the goal is not just to fix current issues but to establish a resilient infrastructure that can adapt to future demands. By taking these steps, you’ll be better equipped to provide a seamless and efficient user experience in your virtualized environments.