Why Traditional Monitoring Says Little About User Experience
September 17, 2020
Today, software needs to ship faster and faster to keep up with customer demand and stay ahead of your competition.
One might even say that if you haven’t updated your app in the last quarter, it is outdated and lacking. At the same time, it is also well known in software development that you may break an existing feature while working on a new one. So, how do software businesses make sure they consistently deliver high-quality software at a high pace?
Proactive monitoring is the key. Even without today’s accelerated pace, manual testing is considered poor practice because it is too labor-intensive and error-prone. Still, it is nearly impossible to keep up with a monthly – or in some case bi-weekly – release cadence without proper test automation. The sooner you catch problems in your release cycle, the cheaper and faster they are to fix. So, the mantra in software development is: “If you want to accelerate, start automating your tests.”
I look at the digital workspace as ‘our product.’ Even though we are not writing the code ourselves, we still need to manage changes to our product like a developer is changing code for theirs. And just like the developer is responsible for their development, so are we for ours. We need to deliver a stable, consistent, and well-performing workspace to our customer, the end-user.
But, when talking to EUC or digital workspace engineers, we often see that there is minimal testing going on. Many say that they already have a monitoring system or think testing user experience is not needed. There are two assumptions in this statement that I believe is false. I want to address both briefly.
Monitoring versus testing
Monitoring is about maintaining the status quo, while testing is all about change. You are predicting and validating the impact of change – any change. These are two entirely different dimensions of managing a healthy digital workspace. With monitoring, you keep an eye on well-known metrics to see if they stay within an expected range while testing, you are preparing for the unexpected. I can already tell you that the list of the unexpected is infinitely more protracted than what you know today.
I always like to refer to airplanes. I think it is pretty safe to assume that the design process of each plane accounts for even the smallest detail the (brilliant) engineers can think of. On top of that, planes and ground equipment are fitted with a ridiculous amount of monitoring systems. And still, a plane is rigorously tested before it is allowed to take flight with passengers. Imagine your friendly neighborhood plane builder would say: “Nah, we don’t need to test. We will monitor the plane extra carefully during its maiden voyage”. Would you go?
I know testing is almost like an insurance policy, but don’t you think a crash is more expensive and costly than testing? Don’t you think the time wasted on investigating an air crash would have been better spent on preventing a crash rather than cleaning up after it?
I admit that loss of productivity is on a slightly different level than a plane crash. But why are those companies testing, and why are so many of us not?
System metrics versus user experience
The second slightly hidden assumption is that you can use system metrics like CPU and memory usage to determine user experience. While it indeed says something about the capacity of a system, it says nothing about performance or user experience. This assumption is mostly incorrect on shared systems like VDI or Microsoft Windows virtual Desktop (WVD), but it also holds on your regular desktops. Remarks we often hear are “my systems are all green, and still my users say it is slow” or “they’re complaining they can’t log on, but everything works.”
1+1=2, right? This is how many engineers view their monitoring data. “My CPU is at 20%, and my memory usage is at 40%; therefore, my performance is good”. “The desktop broker is responding; therefore, my users can log on.”
The problem is that you cannot make such assumptions. 1 + 1 indeed equals 2, but this equation assumes that there are only two factors left of the equal sign while there are many more. I wish I could tell you exactly what factors those are, but I can’t. There is an almost infinite list of factors on the left of the equation. So, what we are saying is this: 1 + 1 + A + B + … = 2. Nobody with any level of understanding of basic math would agree that this equation is not valid.
We focus on the right side of the equation because that matters to the end-user, and it should be the trigger for investigation. What is causing a less than optimal user experience is what matters to the engineer who is tasked with solving it. This is the territory of monitoring systems and root cause analysis.
Better together
Unless you are still managing a Novell Netware 4.1 and MS-DOS 7.22 Local Area Network, your work involves change. And wherever there is a change, there should be tests. And mostly, with today’s accelerated pace of change, we cannot afford to look at system metrics simply. We need a smarter and faster way of detecting anomalies, preferably before we push out changes to production.
So, while monitoring and testing might share the same space, neither is a luxury if you want to keep up.
EUC