Mission

Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In 1996 Brewster Kahle realized the cultural significance of the Internet and the need to record its history. As a result he founded the Internet Archive which collects and permanently stores the Web's digitized content.

In addition to the content of web pages, it's important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.

FAQ

How is the list of URLs generated?

Starting in November 2011, the list of URLs is based solely on the Alexa Top 1,000,000 Sites (zip). Use the HTTP Archive URLs page to see the list of the top 10,000 URLs used in the most recent crawl.

Prior to November 2011 there were 18K URLs analyzed based on the union of the following lists: Alexa 500 (source), Alexa US 500 (source), Alexa 10,000 (source, zip), Fortune 500 (source), Global 500 (source), and Quantcast10K (source).

How is the data gathered?

The list of URLs is fed to our private instance of WebPagetest. (Huge thanks to Pat Meenan!)

The WebPagetest settings are:

Each URL is loaded 3 times. The data from the median run (based on load time) is collected via a HAR file. The HTTP Archive collects these HAR files, parses them, and populates our database with the relevant information.

How accurate is the data, in particular the time measurements?

The "static" measurements (# of bytes, HTTP headers, etc. - everything but time) are accurate at the time the test was performed. It's entirely possible that the web page has changed since it was tested. The tests were performed using a single browser. If the page's content varies by browser this could be a source of differences.

The time measurements are gathered in a test environment, and thus have all the potential biases that come with that:

Given these conditions it's virtually impossible to compare WebPagetest.org's time measurements with those gathered in other browsers or locations or connection speeds. They are best used as a source of comparison.

Why are transfer sizes prior to Oct 1 2012 smaller?

The web10 parameter in the WebPagetest API determines whether the test should stop at document complete (window.onload) as opposed to later once network activity has subsided. Prior to Oct 1 2012 the tests were configured to stop at document complete. However, lazy loading resources (loading them dynamically after window.onload) has grown in popularity. Therefore, this setting was changed so that these post-onload requests would be captured. This resulted in more HTTP requests being recorded with a subsequent bump in transfer size.

What changes have been made to the test environment that might affect the data?

The following test configuration changes could affect results:

What are the limitations of this testing methodology (using lists)?

The HTTP Archive examines each URL in the list, but does not crawl the website other pages. Although this list of websites is well known, the entire website doesn't necessarily map well to a single URL.

Because of these issues and more, it's possible that the actual HTML document analyzed is not representative of the website.

What's a "HAR file"?

HAR files are based on the HTTP Archive specification. They capture web page loading information in a JSON format. See the list of tools that support the HAR format.

How is the HTTP waterfall chart generated?

The HTTP waterfall chart is generated from the HAR file via JavaScript. The code is from Jan Odvarko's HAR Viewer. Jan is also one of the creators of the HAR specification. Thanks Jan!

What are the definitions for the various charts?

The charts from the Trends and Stats pages are explained here.

URLs Analyzed
This chart shows the total number of URLs archived during each crawl. This chart is important because the number of URLs (sample size) can affect the metrics being gathered.
Load Time
This chart plots the average window.onload time in milliseconds.
Start Render Time
Start render is the time at which something was first displayed to the screen.
Total Transfer Size
This is the average transfer size of all responses for a single website. Note that if the response is compressed, the transfer size is smaller than the original uncompressed content.
Total Requests
This chart shows the average number of requests for a single website.
HTML Transfer Size
This is the average transfer size of all HTML responses for a single website. Note that if the response is compressed, the transfer size is smaller than the original uncompressed content.
This chart shows the average number of HTML requests for a single website.
JS Transfer Size
This is the average transfer size of all JavaScript responses for a single website. Note that if the response is compressed, the transfer size is smaller than the original uncompressed content.
JS Requests
This chart shows the average number of JavaScript requests for a single website.
CSS Transfer Size
This is the average transfer size of all stylesheet responses for a single website. Note that if the response is compressed, the transfer size is smaller than the original uncompressed content.
CSS Requests
This chart shows the average number of stylesheet requests for a single website.
Image Transfer Size
This is the average transfer size of all image responses for a single website.
Image Requests
This chart shows the average number of image requests for a single website.
Flash Transfer Size
This is the average transfer size of all Flash responses for a single website.
Flash Requests
This chart shows the average number of Flash requests for a single website.
TCP Connections
This chart shows the average number of TCP connections that were opened during page load. Crawls before May 15 2014 will show zero for this stat.
Speed Index
This chart shows the average Speed Index value across all websites. Speed Index measures how quickly the page is rendered. Lower values are better.
PageSpeed Score
PageSpeed is a performance analysis tool that grades websites on a scale of 1-100. Higher scores are better. This chart shows the average PageSpeed score across all websites.
Doc Size
This chart shows the average size in kB of the main HTML document for the website. NOTE: Right now this is the compressed size. We hope to change that to the uncompressed size in the future.
HTML Document Transfer Size
The transfer size of the main HTML document.
DOM Elements
This charts shows the average number of DOM elements across all websites.
# Domains
A single web page typically loads resources from a variety of web servers across many domains. This chart shows the average number of domains that are accessed across all websites.
Max Reqs on 1 Domain
This long-named chart shows an interesting performance statistic. A single web page typically loads resources from various domains. For each page, the number of requests on the most-used domain is calculated, and this chart shows the average of that value across all websites.
Uncacheable Resources
A response can be read from the cache without requiring any HTTP requests if it is still fresh. This freshness lifetime is determined by the Cache-Control and Expires headers. This chart shows the percentage of responses that were NOT cacheable, i.e, they had a "freshness lifetime" of zero seconds. The calculation of freshness lifetime is complex but we've tried to reproduce it here as follows, where freshness lifetime is expAge:
Sites using Google Libraries API
This is the percentage of sites that have at least one request containing "googleapis.com" in the hostname.
Sites with Flash
This chart shows the percentage of sites that make at least one Flash request. Note that this Flash request could be from an ad or some other third party content on the page, and may not be from the website's main content.
Sites with Custom Fonts
This chart shows the percentage of sites that make at least one request for a custom font. The determination of a custom font request is based on the Content-Type response header. However, since many fonts today do not have the proper Content-Type value, we also include requests that end in ".eot", ".ttf", ".woff", or ".otf", or contain ".eot?", ".ttf?", ".woff?", or ".otf?" (i.e., a querystring).
Compressed Responses
This chart shows the number of compressed responses over the number of HTML, CSS, and JavaScript requests. There's a flaw in this calculation because 10-20% of compressed responses are images, fonts, or Flash. We'll be updating this chart soon.
HTTPS Requests
This chart shows the percentage of requests done over https.
Pages with Errors
This chart shows the percentage of pages that have at least one error, i.e., a response with a 4xx or 5xx status code.
Pages with Redirects
This chart shows the percentage of websites that have at least one redirect. A response is classified as a redirect if it has a 3xx status code other than 304. Note that the redirect may be from an ad or other third party content.
Sites hosting HTML on CDN
This measures the percentage of sites that have their main HTML document served from a CDN.
Average Bytes per Page by Content Type
This chart shows the breakdown of website size by content type. Note that the sizes are the transfer sizes. Therefore, compressed responses are counted as smaller than the original uncompressed content.
Average Individual Response Size
This chart shows the average transfer size for specific content types.
Pages Using Google Libraries API
This chart shows the percentage of sites that have at least one request containing "googleapis.com" in the hostname.
Pages Using Flash
This chart shows the percentage of sites with at least one Flash request.
Pages Using Custom Fonts
This chart shows the percentage of sites that make at least one request for a custom font. See Sites with Custom Fonts for more information.
Image Requests by Format
This chart breaks down all image requests based on format type.
Cache Lifetime
This shows a histogram of the cache lifetime (AKA, freshness lifetime) of requests across all websites. See Uncacheable Resources for more information.
HTTPS Requests
This chart shows the percentage of requests done over https.
Pages with Errors
This chart shows the percentage of pages that have at least one error, i.e., a response with a 4xx or 5xx status code.
Pages with Redirects
This chart shows the percentage of websites that have at least one redirect. A response is classified as a redirect if it has a 3xx status code other than 304. Note that the redirect may be from an ad or other third party content.
Connections per Page
The number of TCP connections opened per page.
Average DOM Depth
The average DOM depth in the page.
Document Height
The document height in pixels.
Size of localStorage
The size of localStorage.
Size of sessionStorage
The size of sessionStorage.
Iframes per Page
The number of iframes in the page.
Script Tags per Page
The number of SCRIPT tags in the page. This includes both external and inline scripts.
Highest Correlation to Load Time
This chart shows the five variables that have the highest correlation to page load time.
Highest Correlation to Render Time
This chart shows the five variables that have the highest correlation to start render time.

What are the definitions for the table columns for a website's requests?

The View Site page contains a table with information about each HTTP request in an individual page, for example http://www.w3.org/. The more obtuse columns are defined here:

Definitions for each of the HTTP headers can be found in the HTTP/1.1: Header Field Definitions.

How do I add a website to the HTTP Archive?

You can add a website to the HTTP Archive via the Add a Site page. We automatically crawl the world's top URLs but we'll crawl one URL per domain for any website, even if it's not in the list of top URLs.

How do I get my website removed from the HTTP Archive?

You can have your site removed from the HTTP Archive via the Remove Your Site page.

How do I report inappropriate (adult only) content?

Please report any inappropriate content by creating a new issue. You may come across inappropriate content when viewing a website's filmstrip screenshots. You can help us flag these websites. Screenshots are not shown for websites flagged as adult only.

Who created the HTTP Archive?

Steve Souders created the HTTP Archive. It's built on the shoulders of Pat Meenan's WebPagetest system. Several folks on Google's Make the Web Faster team chipped in. I've received patches from several individuals including Jonathan Klein, Yusuke Tsutsumi, Carson McDonald, James Byers, Ido Green, Charlie Clark, Jared Hirsch, and Mike Pfirrmann. Guy Leech helped early on with the design. More recently, Stephen Hay created the new logo.

The HTTP Archive Mobile test framework uses Mobitest from Blaze.io & Akamai with much help from Guy (Guypo) Podjarny.

Who sponsors the HTTP Archive?

The HTTP Archive is possible through the support of these sponsors: Google, Mozilla, New Relic, O’Reilly Media, Etsy, Radware, dynaTrace Software, Torbit, Instart Logic, Catchpoint Systems, and Fastly.

The HTTP Archive is part of the Internet Archive, a 501(c)(3) non-profit. Donations in support of the HTTP Archive can be made through the Internet Archive's donation page. Make sure to send a follow-up email to donations@archive.org designating your donation to the "HTTP Archive".

Who do I contact for more information?

Please go to the HTTP Archive discussion list and submit a post.