B2B SEO - Marketing, Search Engine Optimization for Business

Web analytics technologies

February 25th, 2007 · No Comments

There are two main technological approaches to collecting web analytics data. The first method, logfile analysis, reads the logfiles in which the web server records all its transactions. The second method, page tagging, uses JavaScript on each page to notify a third-party server when a page is rendered by a web browser.

Web server logfile analysis

Web servers have always recorded all their transactions in a logfile. It was soon realised that these logfiles could be read by a program to provide data on the popularity of the website. Thus arose web log analysis software.

In the early 1990s, web site statistics consisted primarily of counting the number of client requests made to the web server. This was a reasonable method initially, since each web site often consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful. The first true commercial Log Analyzer was released by IPRO in 1994

Two units of measure were introduced in the mid 1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. The page views and visits are still commonly displayed metrics, but are now considered rather unsophisticated measurements.

The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.

The extensive use of web caches also presented a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser’s cache, and so no request will be received by the web server. This means that the person’s path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Page taggingConcerns about the accuracy of logfile analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or ‘Web bugs’.

In the mid 1990s, Web counters were commonly seen — these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user, which can uniquely identify them during their visit and in subsequent visits.

With the increasingly popularity of Ajax-based solutions, an alternative to the use of an invisible image, is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest objects.

Logfile analysis vs page tagging

Both logfile analysis programs and page tagging solutions are readily available to companies that wish to perform web analytics. In many cases, the same web analytics company will offer both approaches. The question then arises of which method a company should choose. There are advantages and disadvantages to each approach.

Advantages of logfile analysis

The main advantages of logfile analysis over page tagging are as follows.

  • The web server normally already produces logfiles, so the raw data is already available. To collect data via page tagging requires changes to the website.
  • The web server reliably records every transaction it makes. Page tagging relies on the visitors’ browsers co-operating, which a certain proportion may not do (for example, if JavaScript is disabled).
  • The data is on the company’s own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program. Page tagging solutions involve vendor lock-in.
  • Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is important data for performing search engine optimization.
  • Logfiles contain information on failed requests; page tagging only records an event if the page is successfully viewed.

Advantages of page tagging

The main advantages of page tagging over logfile analysis are as follows.

  • The JavaScript is automatically run every time the page is loaded. Thus there are fewer worries about caching.
  • It is easier to add additional information to the JavaScript, which can then be collected by the remote server. For example, information about the visitors’ screen sizes, or the price of the goods they purchased, can be added in this way. With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL.
  • Page tagging can report on events which do not involve a request to the web server, such as interactions within Flash movies.
  • The page tagging service manages the process of assigning cookies to visitors; with logfile analysis, the server has to be configured to do this.
  • Page tagging is available to companies who do not run their own web servers.

Economic factors

Logfile analysis is almost always performed in-house. Page tagging can be performed in-house, but it is more often provided as a third-party service. The economic difference between these two models can also be a consideration for a company deciding which to purchase.

  • Logfile analysis typically involves a one-off software purchase; however, some vendors are introducing maximum annual page views with additional costs to process additional information.
  • Page tagging most often involves a monthly fee, although some vendors offer installable page tagging solutions with no additional page view costs.

Which solution is cheaper often depends on the amount of technical expertise within the company, the vendor chosen, the amount of activity seen on the web sites, the depth and type of information sought, and the number of distinct web sites needing statistics.

Hybrid methods

Some companies are now producing programs which collect data through both logfiles and page tagging. By using a hybrid method, they aim to produce more accurate statistics than either method on its own. The first Hybrid solution was produced in 1998 by Rufus Evison who then spun the product out to create a company based upon the increased accuracy of hybrid methods

Other methods

Other methods of data collection have been used, but are not currently widely deployed. These include integrating the web analytics program into the web server, and collecting data by sniffing the network traffic passing between the web server and the outside world.

There is also another method of the page tagging analysis. Instead of getting the information from the user side, when he / she opens the page, it’s also possible to let the script work on the server side. Right before a page is sent to a user it then sends the data

Tags: Web Analytics