From time to time, I get asked this question: Why is my website so slow? How can I make my website faster? At one time, a faster website meant faster networks, faster servers, faster databases and faster web apps. But that’s really just a small part of the big picture.
I have been delivering guest lectures on building scaleable websites, and while the contents are largely still the same, the emphasis has changed a little over the many years.
This is a TL;DR kind of post, and the short version of it is that your website is slow because you have lousy designers.
I tell many people that there’s a great divide in IT people, the people who actually get involved in the development and engineering. Have you heard about the OSI model? Well, I have a 2-layer model. There are people who work at the operating system level and below, and there other people who work above the operating system. These two layers don’t mix. The people in the two different layers don’t often talk to each other, and when they do, they don’t speak the same language.
Infrastructure people, the ones like network administrators, storage administrators, system administrators, they operate in the first layer. The database administrators, the application programmers, web developers, and including the end users, they are in the upper layer.
[People at the CIO level are sometimes in their own world. They just send an email “I need a website tomorrow,” and how that happens is irrelevant to them.]
When there’s a complaint about website slowness, it’s often starts with the end user. It may happen when a website first goes live. The complaint is handled top down. That means to say, the complaint falls through the top layer, and lands as a problem for the lower layer to solve.
I started my career mostly in network engineering. The network is essentially the plumbing that connects everything together. Sadly, that means I often am the one that has to deal with “my website is slow” complaints. For many years, I struggled with proving to the next-layer IT people that the network is working just fine. At some point, I also got pulled into the system administrator role, and had to prove to the next-layer IT people that the servers are working just fine.
This may seem rather strange to people who don’t work on these problems, but when a website has kind of hung and users are no longer able to access the website, it may have nothing to do with the network or the server. Indeed, I have seen a network with very little traffic, a server that is practically 100% idle, but the website running in it is no longer responding.
At that point, we have the two sides engaged in heated argument. Website administrators are screaming that the server is dead, and system administrators are retorting that their servers are just fine. They both present evidence to support their position. Here is one great example of a communication breakdown between the two layers, the people at or below the operating system level, and those above. You see, the people at the bottom use metrics like megabits-per-second, IO-per-second, page-swaps-per-second, etc, while the other side wants to talk about transactions-per-second, queries-per-second, and the like. If there are business users in there, they might even want to talk about sales-committed-per-second. You know, the sort of metrics that matter to business.
I have been writing web apps myself, right from the time I started work. I’m one of those optimisation-crazy people. (Or at least I was, nowadays I strike a balance to make sure my time is best spent solving the right problems.) If I had to make a second SQL query to serve a webpage, I would actually spend time to weigh the pros and cons. If I could tweak the database schema such that the second query could be merged with the first, I would consider if that made good sense. I know about database normalisation, but sometimes there may be good reasons to break the rules. My point here is that each and every SQL query was properly thought out.
There was this one website performance problem I was tasked to look into, a single webpage request would require hundreds of SQL queries to service it. My gosh, hundreds! Yes, I know, when the developer built his website, the hundreds of queries ran instantly. When the developer tested his website with a few friends, those multiplied to maybe just thousands, and the website would naturally become slower, but perhaps not very perceptible. But once you go live, and you have a hundred or a thousand users hitting your greatest-thing-since-sliced-bread website, you’re now looking at a ridiculously volume of SQL queries every second. Of course the database, and hence the website, will be damn slow! Isn’t that just common sense?
Not to the developer. You see, they just write their program. The database is someone else’s problem.
At the database, I often see tables being created without thought about the underlying table type with respect to the application requirements. People don’t think about choosing the optimal column indices to support their typical queries. Some programmers, or DBA, simply believe that the database will do magic. How the underlying database and storage work is not their problem.
Of course, many application programmers do know about writing SQL statements, normalising database schemas, and various other skills, but they mostly work above the operating system. Similarly, system administrators may be familiar about storage and network technologies, but they are less familiar with the things above the operating system.
The problem here is that there is seldom IT expertise that truly understand in-depth and appreciate the technologies across the entire stack. Thanks to layered models and compartmentalised knowledge domains, people don’t need to know more than they’re interested to or want to know.
Modern websites, along with their apps and services, can become exceedingly complex. There are many people involved in building it. Everyone may be an expert in their own respective areas. But does anyone have some overall appreciation of the entire engineering from top to bottom?
A long time ago, we focused a lot on improving the infrastructure. Make the network faster, the storage faster, and the servers faster. We can also move up the stack to optimise the database, and the application engine. That’s all fine, and certain to make things a little better. The problem is, they made things only very little better.
You see, in most websites for over the last decade, fetching a webpage really means also fetching a few dozen images, stylesheets, fonts, and scripts. All the resources could add up to many hundreds of KBs. Considering the “distance” between the user and the web server, it seems it matters little that a bit of table optimisation improved a database response to a SQL query by 5 ms?
That’s not to say those optimisations are not important. If you’re needing to squeeze the last ounce of performance, you need to do all that. If you’re building the next Facebook, you’d need to do all that.
However, for most websites, the biggest fix to a slow website often isn’t in doing any of those optimisations. Rather, it’s the higher level website design itself that needs fixing. By design, I’m not talking about the choice of fonts and colours, or the layout of page elements. This design, instead, is about how component parts are built and delivered in order to render the final webpage.
Have you wondered how much work your web browser needs to do to render some of the somewhat overweight webpages? The front page of the Straits Times website, for example, makes a web browser make a whopping 343 web requests, totalling 6.88 MB.
News websites are inherently heavy, because they want to cram as much content as possible, use plenty of visuals, plus a particularly heavy dose of advertising. So perhaps no wonder their webpage is so heavy.
On the other hand, I’ve also seen websites that ought to be far simpler but yet also weigh in at some ridiculous megabytes in size. Their webpage has a couple of 1 MB sized images, the size and quality of which is totally wasted for rendering on a computer screen. Even a retina display today doesn’t need that kind of images.
But the photographer who took the photo didn’t know. He didn’t need to know. He would take the best photos he could take. The fella who does the post-processing on the images, perhaps the same photographer, also doesn’t need to know. He would make the best image. The visual designer who decided to incorporate the photo into the website obviously wanted a great impactful photo to use on the webpage, without realising how silly it is.
Such a nice looking photo would be alright, even if useless, if users could actually load the webpage. You see, when you test your website on your local LAN, i.e. when the web server is next to you, or even perhaps under your table, of course it’s fast. It loads instantly. How about the really far away user from thousands of kilometres away who struggles with an Internet that trickles to him at only 128 kbps, if he is lucky, how long do you think he’s going to wait to download that glorious 1 MB photo?
I think he’ll move on to another website.
You’d think that web designers ought to know about these things. In fact, much of what I’m telling you isn’t at all some top-secret knowledge only known by a handful of people. These topics have been revisited time and again. Yet we still see big companies with more than enough resources throw together a professional website that is totally deplorable.
I haven’t even gotten started on two of my other favourite topics, security and load testing. I’ll not talk about them here, lest I get carried away. Just the website design itself can be quite a lot for some people to digest.
We’ll take a break now, but when I return in my follow-up post, I’ll share with you the tools and aids available right now, things that website designers can already be using to fix their slow websites, but somehow aren’t. In fact, some might now even know about them.