Google Maps, Yahoo! Mail, Facebook, MySpace, YouTube, and Amazon are examples of Web sites built to scale. They access petabytes of data sending terabits per second to millions of users worldwide. The magnitude is awe-inspiring.
Users view these large-scale Web sites from a narrower perspective. The typical user has megabytes of data that they download at a few hundred kilobits per second. Users are less interested in the massive number of requests per second being served, caring more about their individual requests. As they use these Web applications they inevitably ask the same question: "Why is this site so slow?"
The answer hinges on where development teams focus their performance improvements. Performance for the sake of scalability is rightly focused on the backend. Database tuning, replicating architectures, customized data caching, and so on, allow Web servers to handle a greater number of requests. This gain in efficiency translates into reductions in hardware costs, data center rack space, and power consumption. But how much does the backend affect the user experience in terms of latency?
The Web applications listed here are some of the most highly tuned in the world, and yet they still take longer to load than we'd like. It almost seems as if the high-speed storage and optimized application code on the backend have little impact on the end user's response time. Therefore, to account for these slowly loading pages we must focus on something other than the backend: we must focus on the frontend.
Figure 1 illustrates the HTTP traffic sent when your browser visits iGoogle with an empty cache. Each HTTP request is represented by a horizontal bar whose position and size are based on when the request began and how long it took. The first HTTP request is for the HTML document (http://www.google.com/ig). As noted in Figure 1, the request for the HTML document took only 9% of the overall page load time. This includes the time for the request to be sent from the browser to the server, for the server to gather all the necessary information on the backend and stitch that together as HTML, and for that HTML to be sent back to the browser.
The primed cache situation for iGoogle is shown in Figure 2. Here there are only two HTTP requests: one for the HTML document and one for a dynamic script. The gap is even larger because it includes the time to read the cached resources from disk. Even in the primed cache situation, the HTML document accounts for only 17% of the overall page load time.
This situation, in which a large percentage of page load time is spent on the frontend, applies to most Web sites. Table 1 shows that eight of the top 10 Web sites in the U.S. (as listed on Alexa.com) spend less than 20% of the end user's response time fetching the HTML document from the backend. The two exceptions are Google Search and Live Search, which are highly tuned. These two sites download four or fewer resources in the empty cache situation, and only one request with a primed cache.
The time spent generating the HTML document affects overall latency, but for most Web sites this backend time is dwarfed by the amount of time spent on the frontend. If the goal is to make the user experience faster, the place to focus is on the frontend. Given this new focus, the next step is to identify best practices for improving frontend performance.
Through research and consulting with development teams, I've developed a set of performance improvements that have been proven to speed up Web pages. A big fan of Harvey Penick's Little Red Book1 with advice like "Take Dead Aim," I set out to capture these best practices in a simple list that is easy to remember. The list has evolved to contain the following 14 prioritized rules:
A detailed explanation of each rule is the basis of my book, High Performance Web Sites2 What follows is a brief summary of each rule.
As the number of resources in the page grows, so does the over all page load time. This is exacerbated by the fact that most browsers only download two resources at a time from a given hostname, as suggested in the HTTP/1.1 specification (http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4).a Several techniques exist for reducing the number of HTTP requests without reducing page content:
A content delivery network (CDN) is a collection of distributed Web servers used to deliver content to users more efficiently. Examples include Akamai Technologies, Limelight Networks, SAWIS, and Panther Express. The main performance advantage provided by a CDN is delivering static resources from a server that is geographically closer to the end user. Other benefits include backups, caching, and the ability to better absorb traffic spikes.
When a user visits a Web page, the browser downloads and caches the page's resources. The next time the user visits the page, the browser checks to see if any of the resources can be served from its cache, avoiding time-consuming HTTP requests. The browser bases its decision on the resource's expiration date. If there is an expiration date, and that date is in the future, then the resource is read from disk. If there is no expiration date, or that date is in the past, the browser issues a costly HTTP request. Web developers can attain this performance gain by specifying an explicit expiration date in the future. This is done with the Expires HTTP response header, such as the following:
Expires: Thu, 1 Jan 2015 20:00:00 GMT
The amount of data transferred over the network affects response times, especially for users with slow network connections. For decades developers have used compression to reduce the size of files. This same technique can be used for reducing the size of data sent over the Internet. Many Web servers and Web hosting services enable compression of HTML documents by default, but compression shouldn't stop there. Developers should also compress other types of text responses, such as scripts, stylesheets, XML, JSON, among others. Gzip is the most popular compression technique. It typically reduces data sizes by 70%.
Stylesheets inform the browser how to format elements in the page. If stylesheets are included lower in the page, the question arises: What should the browser do with elements that it can render before the stylesheet has been downloaded?
One answer, used by Internet Explorer, is to delay rendering elements in the page until all stylesheets are downloaded. But this causes the page to appear blank for a longer period of time, giving users the impression that the page is slow. Another answer, used by Firefox, is to render page elements and redraw them later if the stylesheet changes the initial formatting. This causes elements in the page to "flash" when they're redrawn, which is disruptive to the user. The best answer is to avoid including stylesheets lower in the page, and instead load them in the HEAD of the document.
External scripts (typically, ".js" files) have a bigger impact on performance than other resources for two reasons. First, once a browser starts downloading a script it won't start any other parallel downloads. Second, the browser won't render any elements below a script until the script has finished downloading. Both of these impacts are felt when scripts are placed near the top of the page, such as in the HEAD section. Other resources in the page (such as images) are delayed from being downloaded and elements in the page that already exist (such as the HTML text in the document itself) aren't displayed until the earlier scripts are done. Moving scripts lower in the page avoids these problems.
or as an external script:
Similarly, CSS is included as either an inline style block or an external stylesheet. Which is better from a performance perspective?
The Domain Name System (DNS) is like a phone book: it maps a hostname to an IP address. Hostnames are easier for humans to understand, but the IP address is what browsers need to establish a connection to the Web server. Every hostname that's used in a Web page must be resolved using DNS. These DNS lookups carry a cost; they can take 20100 milliseconds each. Therefore, it's best to reduce the number of unique hostnames used in a Web page.
Redirects are used to map users from one URL to another. They're easy to implement and useful when the true URL is too long or complicated for users to remember, or if a URL has changed. The downside is that redirects insert an extra HTTP roundtrip between the user and her content. In many cases, redirects can be avoided with some additional work. If a redirect is truly necessary, make sure to issue it with a far future Expires header (see Rule 3), so that on future visits the user can avoid this delay.b
If an external script is included multiple times in a page, the browser has to parse and execute the same code multiple times. In some cases the browser will request the file multiple times. This is inefficient and causes the page to load more slowly. This obvious mistake would seem uncommon, but in a review of U.S. Web sites it could be found in two of the top 10 sites. Web sites that have a large number of scripts and a large number of developers are most likely to suffer from this problem.
Entity tags (ETags) are a mechanism used by Web clients and servers to verify that a cached resource is valid. In other words, does the resource (image, script, stylesheet, among others) in the browser's cache match the one on the server? If so, rather than transmitting the entire file (again), the server simply returns a 304 Not Modified status telling the browser to use its locally cached copy. In HTTP/1.0, validity checks were based on a resource's Last-Modified date: if the date of the cached file matched the file on the server, then the validation succeeded. ETags were introduced in HTTP/1.1 to allow for validation schemes based on other information, such as version number and checksum.
ETags don't come without a cost. They add extra headers to HTTP requests and responses. The default ETag syntax used in Apache and IIS makes it likely that the validation will erroneously fail if the Web site is hosted on multiple servers. These costs impact performance, making pages slower and increasing the load on Web servers. This is an unnecessary loss of performance, because most Web sites don't take advantage of the advanced features of ETags, relying instead on the Last-Modified date as the means of validation. By default, ETags are enabled in popular Web servers (including Apache and IIS). If your Web site doesn't utilize ETags, it's best to turn them off in your Web server. In Apache, this is done by simply adding "FileETag none" to your configuration file.
Many popular Web sites are moving to Web 2.0 and have begun incorporating Ajax. Ajax requests involve fetching data that is often dynamic, personalized, or both. In the Web 1.0 world, this data is served by the user going to a specified URL and getting back an HTML document. Because the HTML document's URL is fixed (bookmarked, linked to, and so on), it's necessary to ensure the response is not cached by the browser.
This is not the case for Ajax responses. The URL of the Ajax request is included inside the HTML document; it's not bookmarked or linked to. Developers have the freedom to change the Ajax request's URL when they generate the page. This allows developers to make Ajax responses cacheable. If an updated version of the Ajax data is available, the cached version is avoided by adding a dynamic variable to the Ajax URL. For example, an Ajax request for the user's address book could include the time it was last edited as a parameter in the URL, "&edit=1218050433." As long as the user hasn't edited their address book, the previously cached Ajax response can continue to be used, making for a faster page.
Evangelizing these performance best practices is a challenge. I was able to share this information through Conferences, training classes, consulting, and documentation. Even with the knowledge in hand, it would still take hours of loading pages in a packet sniffer and reading HTML to identify the appropriate set of performance improvements. A better alternative would be to codify this expertise in a tool that anyone could run, reducing the learning curve and increasing adoption of these performance best practices. This was the inspiration for YSlow.
YSlow (http://developer.yahoo.com/yslow/) is a performance analysis tool that answers the question posed in the introduction: "Why is this site so slow?" I created YSlow so that any Web developer could quickly and easily apply the performance rules to their site, and find out specifically what needed to be improved. It runs inside Firefox as an extension to Firebug (http://getfirebug.com/), the tool of choice for many Web developers.
The Screenshot in Figure 3 shows Firefox with iGoogle loaded. Firebug is open in the lower portion of the window, with tabs for Console, HTML, CSS, Script, DOM, and Net. When YSlow is installed, the YSlow tab is added. Clicking YSlow's Performance button initiates an analysis of the page against the set of rules, resulting in a weighted score for the page.
As shown in Figure 3, YSlow explains each rule's results with details about what to fix. Each rule in the YSlow screen is a link to the companion Web site, where additional information about the rule is available.
As described in "Rule 6: Put Scripts at the Bottom," external scripts block the download and rendering of other content in the page. This is true when the script is loaded in the typical way:
But there are several techniques for downloading scripts that avoid this blocking behavior:
You can see these techniques illustrated in Cuzillion (http://stevesouders.com/cuzillion/), but as an example let's look at the script DOM element approach: