B2B SEO - Marketing, Search Engine Optimization for Business

Internet Marketing: Definition, Scope, & Business models

February 4th, 2007 · No Comments

Internet marketing, also referred to as online marketing, is marketing that uses the Internet. The Internet has brought many unique benefits to marketing that include very low costs in distributing information and media to a global audience. However, the interactive nature of the media, both in terms of instant response, and in eliciting response at all, are both desirable qualities of internet marketing.

Internet marketing ties together both the creative and technical aspects of the internet, including design, development, advertising and sales. Internet marketing methods include search engine marketing, display advertising, e-mail marketing, affiliate marketing, interactive advertising and viral marketing.

Definition and scope

Internet marketing is most commonly a component of electronic commerce, but Internet marketing campaigns are also used to drive a marketing message for services that cannot even be ordered online.

Internet marketing can sometimes include information management, public relations, customer service, market research, and sales. Electronic commerce and Internet marketing have become popular as Internet access is becoming more widely available and used. Well over one third of consumers who have Internet access in their homes report using the Internet to make purchases.(Devang, 2007)

Business models

Internet marketing is associated with several business models. The main models include business-to-business (B2B) and business-to-consumer (B2C). B2B consists of companies doing business with each other, whereas B2C involves selling directly to the end consumer (see Malala, 2003)[1] When Internet marketing first began, the B2C model was first to emerge. B2B transactions were more complex and came about later. A third, less common business model is peer-to-peer (P2P), where individuals exchange goods between themselves. An example of P2P is Kazaa, which is built upon individuals sharing files.

Internet marketing can also be seen in various formats. One version is name-your-price (e.g. Priceline.com). With this format, customers are able to state what price range they wish to spend and then select from items at that price range. With find-the-best-price websites (e.g. Hotwire.com), Internet users can search for the lowest prices on items. A final format is online auctions (e.g. Ebay.com) where buyers bid on listed items.

It should be noted, however; as described above, under history, that current use of the term internet marketing commonly refers to the use of direct response marketing strategies, that were traditionally used in direct mail, radio, and TV infomercials, applied to the internet business space. When professionals and entrepreneurs commonly refer to “internet marketing” it is this model that they are often referring to.

→ No CommentsTags: Internet Marketing

nofollow

February 3rd, 2007 · No Comments

nofollow is an HTML attribute value used to instruct search engines that a hyperlink should not influence the link target’s ranking in the search engine’s index. It is intended to reduce the effectiveness of certain types of spamdexing, thereby improving the quality of search engine results and preventing spamdexing from occurring in the first place.

rel=”nofollow” has come to be regarded as a microformat. Microformats reuse existing attributes but extend the standard values for the attribute. “nofollow” is a custom attribute value

Concept and specification

The concept for the specification of the attribute value nofollow was designed by Google’s head of webspam team Matt Cutts and Jason Shellen from Blogger.com in 2005.

The specification for nofollow is (C) 2005-2007 by the authors and subject to a royalty free patent policy, e.g. per the W3C Patent Policy 20040205, and IETF RFC3667 & RFC3668. The authors intend to submit this specification to a standards body with a liberal copyright/licensing policy such as the GMPG, IETF, and/or W3C.

What nofollow is not for

The nofollow attribute value is not meant for blocking access to content or preventing content to be indexed by search engines. The proper methods for blocking search engine spiders to access content on a website or for preventing them to include the content of a page in their index are the Robots Exclusion Standard (robots.txt) for blocking access and on page Meta Elements that are designed to specify on an individual page level, what search engine spider should or should not do with the content of the crawled page.

Introduction and support

Google announced in early 2005 that hyperlinks with rel=”nofollow” attribute would not influence the link target’s PageRank. In addition, the Yahoo and MSN search engines also respect this tag.

How the attribute is being interpreted differs between the search engines. While some take it literally and do not follow the link to the page being linked to, others still “follow” the link to find new web pages for indexing. In the latter case rel=”nofollow” actually tells a search engine “Don’t score this link” rather than “Don’t follow this link.” This differs from the meaning of nofollow as used within a robots meta tag, which does tells a search engine: “Do not follow any of the hyperlinks in the body of this document.”.

Interpretation by the individual search engines

While all engines that support the attribute exclude links that use the attribute from their ranking calculation, the details about the exact interpretation of the attribute vary from search engine to search engine.

  • Google takes “nofollow” literally and does not “follow” the link at all. That is supposedly their official statement, but experiments conducted by SEOs show conflicting results. They show instead that Google does follow the link, but not index the linked-to page, unless it was in Google’s index already for other reasons (such as other, non-nofollow links that point to the page).
  • Yahoo! “follows it”, but excludes it from their ranking calculation.
  • MSN Search respects “nofollow” as regards not counting the link in their ranking, but it is not proven whether or not MSN follows the link.
  • Ask.com does not use the attribute for anything.

Usage by weblog software

Most weblog software marks reader-submitted links this way by default (with no option to disable it without code modification). A more sophisticated server software could spare the nofollow for links submitted by trusted users like those registered for a long time, on a whitelist, or with a high karma. Some server software adds rel=”nofollow” to pages that have been recently edited but omits it from stable pages, under the theory that stable pages will have had offending links removed by human editors.

The widely used blogging platform WordPress version 1.5 and above automatically assigns the nofollow attribute to all user-submitted links (comment data, commenter URI, etc).

Usage on other websites

MediaWiki software, which powers Wikipedia, was equipped with nofollow support soon after initial announcement in 2005. The option was enabled on most international Wikipedias. One of the prominent exceptions was the English language one. Initially, after a discussion, it was decided not to use rel=”nofollow” in articles and to use a URL blacklist instead. In this way, English Wikipedia contributed to the scores of the pages it linked to, and expected editors to link to relevant pages.

In May 2006, a patch to MediaWiki software allowed to enable nofollow selectively in namespaces. This functionality was used on pages that are not considered to be part of the actual encyclopedia, such as discussion pages and resources for editors. Following increasing spam problems and a within-Foundation order from Jimmy Wales, rel=”nofollow” was added to article-space links in January 2007; However, the various interwiki templates and shortcuts that link to other Wikimedia Foundation projects and many external wikis such as Wikia are not affected by this policy.

Other websites like Slashdot, with high user participation, use improvised nofollow implementations like adding rel=”nofollow” only for potentially misbehaving users. Potential spammers posing as users can be determined through various heuristics like age of registered account and other factors. Slashdot also uses the poster’s karma as a determinant in attaching a nofollow tag to user submitted links.

Repurpose for paid links

While the effectiveness of the nofollow attribute to prevent comment spam is in doubt and raises other issues instead, search engines have moved ahead and attempted to repurpose the attribute for something different. Google began suggesting the use of nofollow also as a machine-readable disclosure for paid links, so that these links do not get credit in search engines results.

The growth of the link buying economy, where company’s entire business model is based on paid links that affect search engine rankings, caused the debate about the use of nofollow in combination with paid links to move into the center of attention of the search engines, who started to take active steps against link buyers and sellers. This triggered a very strong response by the web master community in return and also raised new questions that need to be answered

Criticism

Some weblog authors object to the use of rel=”nofollow”, arguing, for example, that

  • Link spammers will continue to spam everyone to reach the sites that do not use rel=”nofollow”
  • Link spammers will continue to place links for clicking (by surfers), even if those links are ignored by search engines.
  • Google is advocating the use of rel=”nofollow” in order to reduce the effect of heavy inter-blog linking on page ranking.

→ No CommentsTags: SEO Glossary · SEO Introduction

Meta Tags – More information

February 2nd, 2007 · No Comments

Meta Tags – Additional attibutes for search engines

NOODP

The search engines Google, Yahoo! and MSN use in some cases the title and abstract of the Open Directory Project (ODP) listing of a web site at Dmoz.org for the title and/or description (also called snippet or abstract) in the search engine results pages (SERPS). To give webmasters the option to specify that the ODP content should not be used for listings of their website, Microsoft introduced in May 2006 the new “NOODP” value for the “robots” element of the meta tags. Google followed in July 2006 and Yahoo! in October 2006.

The syntax is the same for all search engines who support the tag.

<META NAME=”ROBOTS” CONTENT=”NOODP”>

Webmasters can decide if they want to disallow the use of their ODP listing on a per search engine basis

Google: <META NAME=”GOOGLEBOT” CONTENT=”NOODP”>

Yahoo! <META NAME=”Slurp” CONTENT=”NOODP”>

MSN and Live Search: <META NAME=”msnbot” CONTENT=”NOODP”>

NOYDIR

Yahoo! also used next to the ODP listing the content from their own Yahoo! directory but introduced in February 2007 a meta tag that provides webmasters with the option to opt-out of this.

Yahoo! Directory titles and abstracts will not be used in search results for their pages if the NOYDIR tag is being added to a web page.

<META NAME=”ROBOTS” CONTENT=”NOYDIR”>

<META NAME=”Slurp” CONTENT=”NOYDIR”>


Robots-NoContent

Yahoo! also introduced in May 2007 the “class=robots-nocontent” tag. This is not a meta tag, but a tag, which can be used throughout a web page where needed. Content of the page where this tag is being used will be ignored by the Yahoo! crawler and not included in the search engine’s index.

Examples for the use of the robots-nocontent tag:

<div class=”robots-nocontent”>excluded content</div>

<span class=”robots-nocontent”>excluded content</span>

<p class=”robots-nocontent”>excluded content</p>
Academic studies

Google does not use HTML keyword or metatag elements for indexing. The Director of Research at Google, Monika Henziger, was quoted (in 2002) as saying, “Currently we don’t trust metadata. Other search engines developed techniques to penalize web sites considered to be “cheating the system”. For example, a web site repeating the same meta keyword several times may have its ranking decreased by a search engine trying to eliminate this practice, though that is unlikely. It’s more likely that a search engine will ignore the meta keyword element completely, and most do regardless of how many words used in the element.

→ No CommentsTags: Meta Tags

Other types of spamdexing

February 1st, 2007 · No Comments

Mirror websites 

Hosting of multiple websites all with conceptually similar content but using different URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.

URL redirection 

Taking the user to another page without his or her intervention, e.g. using META refresh tags, Java, JavaScript or Server side redirects

Cloaking 

Cloaking refers to any of several means to serve up a different page to the search-engine spider than will be seen by human users. It can be an attempt to mislead search engines regarding the content on a particular web site. However, cloaking can also be used to ethically increase accessibility of a site to users with disabilities, or to provide human users with content that search engines aren’t able to process or parse. It is also used to deliver content based on a user’s location; Google itself uses IP delivery, a form of cloaking, to deliver results.
A form of this is code swapping, this is: optimizing a page for top ranking, then, swapping another page in its place once a top ranking is achieved.

→ No CommentsTags: Black Hat SEO · SEO Basic · SEO Glossary · SEO Spam

Meta element use in search engine optimisation

February 1st, 2007 · No Comments

Meta elements (Meta Tags) are HTML elements used to provide structured metadata about a web page. Such elements must be placed as tags in the head section of an HTML document.

Meta elements (Meta Tags) provide information about a given webpage, most often to help search engines categorize them correctly. They are inserted into the HTML document, but are often not directly visible to a user visiting the site.

They have been the focus of a field of marketing research known as search engine optimisation (SEO), where different methods are explored to provide a user’s site with a higher ranking on search engines. In the mid to late 1990s, search engines were reliant on meta data to correctly classify a web page and webmasters quickly learned the commercial significance of having the right meta element, as it frequently led to a high ranking in the search engines — and thus, high traffic to the web site.

As search engine traffic achieved greater significance in online marketing plans, consultants were brought in who were well versed in how search engines perceive a web site. These consultants used a variety of techniques (legitimate and otherwise) to improve ranking for their clients.

Meta elements (Meta Tags) have significantly less effect on search engine results pages today than they did in the 1990’s and their utility has decreased dramatically as search engine robots have become more sophisticated. This is due in part to the nearly infinite re-occurrence (keyword stuffing) of meta elements and/or to attempts by unscrupulous website placement consultants to manipulate (spamdexing) or otherwise circumvent search engine ranking algorithms. While search engine optimization can improve search engine ranking, consumers of such services should be careful to employ only reputable providers.

Major search engine robots are more likely to quantify such factors as the volume of incoming links from related websites, quantity and quality of content, technical precision of source code, spelling, functional v. broken hyperlinks, volume and consistency of searches and/or viewer traffic, time within website, page views, revisits, click-throughs, technical user-features, uniqueness, redundancy, relevance, advertising revenue yield, freshness, geography, language and other intrinsic characteristics.

The keywords attribute

The keywords attribute was popularized by search engines such as Infoseek and AltaVista in 1995, and its popularity quickly grew until it became one of the most commonly used meta elements. By late 1997, however, search engine providers realized that information stored in meta elements, especially the keyword attribute, was often unreliable and misleading, and at worst, used to draw users into spam sites. (Unscrupulous webmasters could easily place false keywords into their meta elements in order to draw people to their site.)

Search engines began dropping support for metadata provided by the meta element in 1998, and by the early 2000s, most search engines had veered completely away from reliance on meta elements, and in July 2002 AltaVista, one of the last major search engines to still offer support, finally stopped considering them. The Director of Research at Google, Monika Henziger, was quoted (in 2002) as saying, “Currently we don’t trust metadata”.

No consensus exist whether or not the keywords attribute has any impact on ranking at any of the major search engine today. It is being speculated that they do, if the keywords used in the meta can be found in the page copy itself. 37 leaders in search engine optimization concluded in April 2007 that the relevance of having your keywords in the meta attribute keywords is little to none.

The description attribute

Unlike the keyword attribute, the description attribute is supported by most major search engines, like Yahoo and Live Search, while Google will fall back on this tag when information about the page itself is requested (e.g. using the related: query). The description attribute provides a concise explanation of a web page’s content. This allows the webpage authors to give a more meaningful description for listings than might be displayed if the search engine was to automatically create its own description based on the page content. The description is often, but not always, displayed on search engine results pages, so it can impact click-through rates. Industry commentators have suggested that major search engines also consider keywords located in the description attribute when ranking pages.W3C doesn’t specify the size of this description meta tag, but almost all search engines recommend it to be shorter than 200 characters of plain text[citation needed].
The robots attribute

The robots attribute is used to control whether search engine spiders are allowed to index a page, or not, and whether they should follow links from a page, or not. The noindex value prevents a page from being indexed, and nofollow prevents links from being crawled. Other values are available that can influence how a search engine indexes pages, and how those pages appear on the search results. The robots attribute is supported by several major search engines. There are several additional values for the robots meta attribute that are relevant to search engines, such as NOARCHIVE and NOSNIPPET, which are meant to tell search engines what not to do with a web pages content. Meta tags are not the best option to prevent search engines from indexing content of your website. A more reliable and efficient method is the use of the Robots.txt file (Robots Exclusion Standard).

NOINDEX tag tells Google not to index a specific page. NOFOLLOW tag tells Google not to follow the links on a specific page. NOARCHIVE tag tells Google not to store a cached copy of your page. NOSNIPPET tag tells Google not to show a snippet (description) under your Google listing, it will also not show a cached link in the search results

→ No CommentsTags: Meta Tags

Link spam – Black Hat SEO

January 31st, 2007 · No Comments

Link spam takes advantage of link-based ranking algorithms, such as Google’s PageRank algorithm, which gives a higher ranking to a website the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm.

Link farms 

Involves creating tightly-knit communities of pages referencing each other, also known humorously as mutual admiration societies.

Hidden links 

Putting links where visitors will not see them in order to increase link popularity.

“Sybil attack” 

This is the forging of multiple identities for malicious intent, named after the famous multiple personality disorder patient Shirley Ardell Mason. A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs known as spam blogs.

Wiki spam 

Using the open editability of wiki systems to place links from the wiki site to the spam site. Often, the subject of the spam site is totally unrelated to the page on the wiki where the link is added. In early 2005, Wikipedia implemented a ‘nofollow’ value for the ‘rel’ HTML attribute. Links with this attribute are ignored by Google’s PageRank algorithm. Forum and Wiki admins can use these to end or discourage Wiki spam.

Spam in blogs 

This is the placing or solicitation of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs and any site that accepts visitors comments are particular targets and are often victims of drive by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted.

Spam blogs 

Also known as splogs, a spam blog, on the contrary, is a fake blog created exclusively with the intent of spamming. They are similar in nature to link farms.

Page hijacking 

This is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler, but redirects web surfers to unrelated or malicious websites.

Referer log spamming 

When someone accesses a web page, i.e. the referee, by following a link from another web page, i.e. the referer, the referee is given the address of the referer by the person’s internet browser. Some websites have a referer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referer, that message or internet address then appears in the referer log of those sites that have referer logs. Since some search engines base the importance of sites by the number of different sites linking to them, referer-log spam may be used to increase the search engine rankings of the spammer’s sites, by getting the referer logs of many sites to link to them.

Buying expired domains 

Some link spammers monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages.

Some of these techniques may be applied for creating a Google bomb, this is, to cooperate with other users to boost the ranking of a particular page for a particular query.

→ No CommentsTags: Black Hat SEO · SEO Glossary · SEO Spam

Content spam – SEO Black Hat Technique

January 30th, 2007 · No Comments

These techniques involve altering the logical view that a search engine has over the page’s contents. They all aim at variants of the vector space model for information retrieval on text collections.

Hidden or invisible text 

Disguising keywords and phrases by making them the same (or almost the same) color as the background, using a tiny font size or hiding them within the HTML code such as “no frame” sections, ALT attributes and “no script” sections. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. However, hidden text is not always spamdexing: it can also be used to enhance accessibility.

Keyword stuffing 

This involves the calculated placement of keywords within a page to raise the keyword count, variety, and density of the page. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is consistent with other sites created specifically to attract search engine traffic.

Meta tag stuffing 

Repeating keywords in the Meta tags, and using keywords that are unrelated to the site’s content. This tactic has been ineffective since 2005.

“Gateway” or doorway pages 

Creating low-quality web pages that contain very little content but are instead stuffed with very similar key words and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have “click here to enter” in the middle of it.

Scraper sites 

Scraper sites, also known as Made for AdSense sites, are created using various programs designed to ‘scrape’ search engine results pages or other sources of content and create ‘content’ for a website. The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. These types of websites are generally full of advertising, or redirect the user to other sites.

→ No CommentsTags: Uncategorized

Spamdexing

January 29th, 2007 · No Comments

Spamdexing is any of various methods to manipulate the relevancy or prominence of resources indexed by a search engine, usually in a manner inconsistent with the purpose of the indexing system. It is a form of search engine optimization. Search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in the META keywords tag, others whether the search term appears in the body text or URL of a web page. Many search engines check for instances of spamdexing and will remove suspect pages from their indexes.

The rise of spamdexing in the mid-1990s made the leading search engines of the time less useful, and the success of Google at both producing better search results and combating keyword spamming, through its reputation-based PageRank link analysis system, helped it become the dominant search site late in the decade, where it remains. Although it has not been rendered useless by spamdexing, Google has not been immune to more sophisticated methods either. Google bombing is another form of search engine result manipulation, which involves placing hyperlinks that directly affect the rank of other sites. Google first algorithmically combated Google bombing on January 25, 2007.

The earliest known reference to the term spamdexing is by Eric Convey in his article “Porn sneaks way back on Web,” The Boston Herald, May 22, 1996, where he said:

The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses. The process is called “spamdexing,” a combination of spamming — the Internet term for sending users unsolicited information — and “indexing.”

Common spamdexing techniques can be classified into two broad classes: content spam and link spam.

→ No CommentsTags: SEO Basic · SEO Glossary

Link doping

January 28th, 2007 · No Comments

Link doping refers to the practice and effects of embedding a large number of gratuitous hyperlinks on a website in exchange for return links. Mainly used when describing weblogs (or blogs), link doping usually implies that a person hyperlinks to sites he or she has never visited in return for a place on the website’s blogroll for the sole purpose of inflating the apparent popularity of his or her website. Since the PageRank algorithms of many web directories and search engines rely on the number of hyperlinks to a website to determine its importance or influence, link doping can result in a high placement or ranking for the offending website (see also Google bomb or Google wash).

Originally used in an essay published in Sobriquet Magazine and on Blogcritics.org, link doping has been confused with the related practice of excessive hyperlinking, also known as “link whoring”. While the two phrases may be used interchangeably to describe gratuitous linking, link doping carries the additional connotation of deliberately striving to attain a certain level of success for one’s website without having earned it through hard work (as an average athlete on steroids might perform better than a naturally gifted athlete not on performance-enhancing drugs).

→ No CommentsTags: SEO Glossary

Google’s response to Google bomb

January 26th, 2007 · No Comments

Google defends its search algorithm as generally effective and an accurate reflection of opinion on the Internet. They further state that, though some may be offended by the links which appear as the result of Google bombs, that Google has little or no control over the practice and will not individually edit search results due to the fact that a bomb may have occurred.

Marissa Mayer, Director of Consumer Web Products for Google, wrote on the official Google Blog in September 2005:

We don’t condone the practice of Google bombing, or any other action that seeks to affect the integrity of our search results, but we’re also reluctant to alter our results by hand in order to prevent such items from showing up. Pranks like this may be distracting to some, but they don’t affect the overall quality of our search service, whose objectivity, as always, remains the core of our mission.

On January 25th, 2007 Google announced on its official Google Webmaster Central blog that they now have “an algorithm that minimizes the impact of many Googlebombs [sic].” The algorithm change had an immediate effect, dropping the well-known “miserable failure” link to the White House off the front page. Instead, the page contained mainly pages which discuss the miserable failure bomb. A related Google bomb was the No.1 ranking held by Tony Blair’s website for the term “liar”. As of May 2, 2007, the bomb had disappeared both from Google and Yahoo, but not from the UK version of MSN. Google bombs in which the target page actually contains the search word(s) were not affected.

→ No CommentsTags: Black Hat SEO · Google bomb · SEO Basic

Effects of Google bomb

January 25th, 2007 · No Comments

In some cases, the phenomenon has produced competing attempts to use the same search term as a Google bomb. As a result, the first result at any given time varies, but the targeted sites will occupy all the top slots using a normal search instead of “I’m feeling lucky,” a special button on Google’s interface that sends the user straight to the top site in the search.

Other search engines use similar techniques to rank results, so Yahoo!, AltaVista, and HotBot are also affected by Google bombs. A search for “miserable failure” or “failure” on September 29, 2006 brought up the official George W. Bush biography number one on Google, Yahoo! and MSN and number two on Ask.com. On June 2, 2005, Yooter reported that George Bush is now ranked first for the keyword ‘miserable’, ‘failure’ and ‘miserable failure’ in both Google and Yahoo!. And on September 16, 2005, Marissa Mayer wrote on Google Blog about the practice of Google bombing and the word “failure.” (See Google’s response below). Other large political figures have been targeted for Google bombs: on January 6, 2006, Yooter reported that Tony Blair is now indexed in the U.S. and UK versions of Google for the keyword ‘liar’. Only a few search engines, such as Ask.com, MetaCrawler and ProFusion, do not produce the same first links as the rest of the search engines. MetaCrawler and ProFusion are metasearch engines which use multiple search engines.

The growth of Wikipedia has alleviated these one-word Google bombs somewhat, as for a definable concept (for example “liar”) there is usually a popular article with that title, which often appears first in results.

The BBC, reporting on Google bombs in 2002, actually used the headline “Google Hit By Link Bombers”[9], acknowledging to some degree the idea of “link bombing.” In 2004, the Search Engine Watch site suggested that the term should be “link bombing” because of its application beyond Google, and continues to use that term as it is considered more accurate.

→ No CommentsTags: Black Hat SEO · SEO Basic

History of Google bomb (link bomb)

January 24th, 2007 · No Comments

The first Google bombs were probably accidental. Users would discover that a particular search term would bring up an interesting result, leading many to believe that Google’s results could be manipulated intentionally. The first Google bomb known about by a significant number of people was the one that caused the search term “more evil than Satan himself” to bring up the Microsoft homepage as the top result. Numerous people have made claims to having been responsible for the Microsoft Google bomb, though none have been verified.

In September of 2000 the first Google bomb with a verifiable creator was created by Hugedisk Men’s Magazine, a now-defunct online humor magazine, when it linked the text “dumb motherfucker” to a site selling George W. Bush-related merchandise. A Google search for this term would return the pro-Bush online store as its top result. Hugedisk had also unsuccessfully attempted to Google bomb an equally derogatory term to bring up an Al Gore-related site. After a fair amount of publicity the George W. Bush-related merchandise site retained lawyers who sent a cease and desist letter to Hugedisk, thereby ending the Google bomb.

In April 6, 2001 in an article in the online magazine uber.nu Adam Mathes is credited with coining the term “Google Bombing.” In the article Mathes details his connection of the search term “talentless hack” to the website of his friend Andy Pressman by recruiting fellow webloggers to link to his friend’s page with the desired term. However, Archimedes Plutonium is known to have used the phrase “search engine bombing” (and variants, including “searchengine bombing” and “searchenginebombed”) on Usenet as early as 1997

→ No CommentsTags: Google bomb · SEO Basic · SEO History · SEO Spam

Keyword stuffing

January 23rd, 2007 · No Comments

Keyword stuffing is considered to be an unethical search engine optimization (SEO) technique. Keyword stuffing occurs when a web page is loaded with keywords in the meta tags or in content. The repetition of words in meta tags may explain why many search engines no longer use these tags.

Keyword stuffing is used to obtain maximum search engine ranking and visibility for particular phrases. A word that is repeated too often may raise a red flag to search engines. In particular, Google has been known to delist sites employing this technique, and their indexing algorithm specifically lowers the ranking of sites that do this.

Hiding text out of view of the visitor is done in many different ways. Text colored to blend with the background, CSS “Z” positioning to place text “behind” an image – and therefore out of view of the visitor – and CSS absolute positioning to have the text positioned far from the page center, are all common techniques. As of 2005, some of these invisible text techniques can be detected by major search engines.

“Noscript” tags are another way to place hidden content within a page. While they are a valid optimization method for displaying an alternative representation of scripted content, they may be abused, since search engines may index content that is invisible to most visitors.

Inserted text sometimes includes words that are frequently searched (such as “sex”), even if those terms bear little connection to the content of a page, in order to attract traffic to advert-driven pages.

Keyword stuffing can be considered to be either a white hat or a black hat tactic, depending on the context of the technique, and the opinion of the person judging it. While a great deal of keyword stuffing is employed to aid in spamdexing, which is of little benefit to the user, keyword stuffing in certain circumstances is designed to benefit the user and not skew results in a deceptive manner. Whether the term carries a pejorative or neutral connotation is dependent on whether the practice is used to pollute the results with pages of little relevance, or to direct traffic to a page of relevance that would have otherwise been de-emphasized due to the search engine’s inability to interpret and understand related

→ No CommentsTags: Black Hat SEO · Keyword stuffing · SEO Spam

Page hijacking

January 21st, 2007 · No Comments

Page hijacking

Page hijacking is a form of spamming the index of a search engine (spamdexing). It is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler, but redirects web surfers to unrelated or malicious websites. Spammers can use this technique to achieve high rankings in result pages for certain key words.

Page hijacking is a form of cloaking, made possible because some web crawlers detect duplicates while indexing web pages. If two pages have the same content, only one of the URLs will be kept. A spammer will try to ensure that the rogue website is the one shown on the result pages.

Case Study: Google Jacking

One form of this activity involves 302 server-side redirects on Google. Hundreds of 302 Google Jacking pages were said to have been reported to Google. [citation needed] While Google has not officially acknowledged that page hijacking is a real problem, several people have found to be victims of this phenomenon when checking the search engine rankings for their website. Because it is difficult to quantify how many pages have been hijacked, GoogleJacking.org was founded in May 2006 to help make Google aware of the significance of Google Jacking. Visitors can add themselves to a map, providing a visual indicator of how widespread the problem is.

Example of Page Hijacking

Suppose that a website offers difficult to find sizes of clothes. A common search entered to reach this website is really big t-shirts, which – when entered on popular search engines – made the website show up as the first result:

SpecialClothes
Offering clothes in sizes you cannot find elsewhere.
www. example.com/
A spammer working for a competing company then creates a website that looks extremely similar to one listed and includes a special redirection script that redirects web surfers to the competitor’s site, but shows the page to web crawlers. After several weeks, a web search for really big t-shirts then shows the following result:

SpecialClothes
Offering clothes in sizes you cannot find elsewhere… at better prices!
www. example.net/
—Show Similar Pages—
Notice how .com changed to .net and the new “Show Similar Pages” link.
When web surfers click on this result, they are redirected to the competing website. The original result was hidden in the “Show Similar Pages” section.

→ No CommentsTags: Black Hat SEO · SEO Spam

Link farm

January 20th, 2007 · No Comments

On the World Wide Web, a link farm is any group of web sites that all hyperlink to every other page in the group. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a search engine (sometimes called spamexing or spamdexing). Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites and are not considered a form of spamdexing.

History of Link farm

Link farms were developed by search engine optimizers in 1999 to take advantage of the Inktomi search engine’s dependence upon link popularity. Although link popularity is used by some search engines to help establish a ranking order for search results, the Inktomi engine at the time maintained two indexes. Search results were produced from the primary index which was limited to approximately 100,000,000 listings. Pages with few inbound links continually fell out of the Inktomi index on a monthly basis.

Inktomi was targeted for manipulation through link farms because it was then used by several independent but popular search engines, such as HotBot. Yahoo!, then the most popular search service, also used Inktomi results to supplement its directory search feature. The link farms helped stabilize listings primarily for online business Web sites that had few natural links from larger, more stable sites in the Inktomi index.

Link farm exchanges were at first handled on an informal basis, but several service companies were founded to provide automated registration, categorization, and link page updates to member Web sites.

When the Google search engine became popular, search engine optimizers learned that Google’s ranking algorithm depended in part on a link weighting scheme called PageRank. Rather than simply count all inbound links equally, the PageRank algorithm determines that some links may be more valuable than others, and therefore assigns them more weight than others. Link farming was adapted to help increase the PageRank of member pages.

However, even the link farms became susceptible to manipulation by unscrupulous Webmasters who joined the services, received inbound linkage, and then found ways to hide their outbound links or to avoid posting any links on their sites at all. Link farm managers had to implement quality controls and monitor member compliance with their rules to ensure fairness.

Alternative link farm products emerged, particularly link-finding software that identified potential reciprocal link partners, sent them template-based emails offering to exchange links, and create directory-like link pages for Web sites hoping to build their link popularity and PageRank.

Search engines countered the link farm movement by identifying specific attributes associated with link farm pages and filtering those pages from indexing and search results. In some cases, entire domains were removed from the search engine indexes in order to prevent them from influencing search results.

Justification of Link farm

The justification for link farm-influenced crawling diminished proportionately as the search engines expanded their capacities to index more sites. Once the 500,000,000 listing threshold was crossed, link farms became unnecessary for helping sites stay in primary indexes. Inktomi’s technology, now a part of Yahoo!, now indexes billions of Web pages and uses them to offer its search results.

Where link weighting is still believed by some Webmasters to influence search engine results with Google, Yahoo!, MSN, and Ask (among others), link farms remain a popular tool for increasing PageRank or perceived equivalent values. PageRank-like measurements apply only to the individual pages being linked to (typically the reciprocal linking pages on member sites), so these pages must in turn link out to other pages (such as the main index pages of the member sites) in order for the link weighting to help.

The expression “link farm” is now considered to be pejorative and derogatory. Many reciprocal link management service operators tout the value of their resource management and direct networking relationship building. The reciprocal link management services promote their industry as an alternative to search engines for finding and attracting visitors to Web sites. Their acceptance is by no means universal but the link management services seem to have established a stable customer base.

→ No CommentsTags: Link farm

Robots Exclusion Standard

January 8th, 2007 · No Comments

The robots exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard complements Sitemaps, a robot inclusion standard for websites.

A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data.

The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser.

There is no official standards body or RFC for the robots.txt protocol. It was created by consensus in June 1994 by members of the robots mailing list (robots-request@ nexor.co.uk). The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website. The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final ‘/’ character appended, otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

Examples

This example allows all robots to visit all files because the wildcard “*” specifies all robots:

User-agent: *
Disallow:
This example keeps all robots out:

User-agent: *
Disallow: /
The next is an example that tells all crawlers not to enter into four directories of a website:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/
Example that tells a specific crawler not to enter one specific directory:

User-agent: BadBot
Disallow: /private/
Example that tells all crawlers not to enter one specific file:

User-agent: *
Disallow: /directory/file.html
Note that all other files in the specified directory will be processed.

Example demonstrating how comments can be used:

# Comments appear after the “#” symbol at the start of a line, or after a directive
User-agent: * # match all bots
Disallow: / # keep them out

Compatibility

In order to prevent access to all pages by robots, do not use

Disallow: *
as this is not a stable standard extension.

Instead:

Disallow: /
should be used.

Sitemaps auto-discovery

The Sitemap parameter is supported by major crawlers (including Google, Yahoo, MSN, Ask). Sitemaps specifies the location of the site’s list of URLs. This parameter is independent from User-agent parameter so it can be placed anywhere in the file.

Sitemap: http://www.example.com/sitemap.xml.gz

Nonstandard extensions

Several crawlers support a Crawl-delay parameter, set to the number of seconds to wait between successive requests to the same server:

User-agent: *
Crawl-delay: 10

Extended Standard

An Extended Standard for Robot Exclusion has been proposed, which adds several new directives, such as Visit-time and Request-rate. For example:

User-agent: *
Disallow: /downloads/
Request-rate: 1/5         # maximum rate is one page every 5 seconds
Visit-time: 0600-0845     # only visit between 6:00 AM and 8:45 AM UT (GMT)
The first version of the Robot Exclusion standard, does not mention anything about the “*” character in the Disallow: statement. Modern crawlers like Googlebot and Slurp recognize strings containing “*”, while MSNbot and Teoma interpret it in different ways

Alternatives

While robots.txt is the older and more widely accepted, but there are other methods (which can be used together with robots.txt) that allow greater control, like disabling indexing of images only or disabling archiving of page contents.
HTML meta tags for robots

HTML meta tags can be used to exclude robots according to the contents of web pages. Again, this is purely advisory, and also relies on the cooperation of the robot programs. For example,

<meta name=”robots” content=”noindex,nofollow” />
within the HEAD section of an HTML document tells search engines such as Google, Yahoo!, or MSN to exclude the page from its index and not to follow any links on this page for further possible indexing.

→ No CommentsTags: Search Engine Robots

Search Engine International market share

January 7th, 2007 · No Comments

The search engines’ market shares vary from market to market, as does competition. In 2003, Danny Sullivan stated that Google represented about 75% of all searches. In markets outside the United States, Google’s share is often larger, and Google remains the dominant search engine worldwide as of 2007. As of 2006, Google held about 40% of the market in the United States, but Google had an 85-90% market share in Germany. While there were hundreds of SEO firms in the US at that time, there were only about five in Germany.

In Russia the situation is reversed. Local search engine Yandex controls 50% of the paid advertising revenue, while Google has less than 9%. In China, Baidu continues to lead in market share, although Google has been gaining share as of 2007.

Successful search optimization for international markets may require professional translation of web pages, registration of a domain name with a top level domain in the target market, and web hosting that provides a local IP address. Otherwise, the fundamental elements of search optimization are essentially the same, regardless of language.

→ No CommentsTags: SEO Introduction

White hat versus black hat SEO

January 5th, 2007 · No Comments

SEO techniques are classified by some into two broad categories: techniques that search engines recommend as part of good design, and those techniques that search engines do not approve of and attempt to minimize the effect of, referred to as spamdexing. Some industry commentators classify these methods, and the practitioners who employ them, as either white hat SEO, or black hat SEO. White hats tend to produce results that last a long time, whereas black hats anticipate that their sites will eventually be banned once the search engines discover what they are doing.

A SEO tactic, technique or method is considered white hat if it conforms to the search engines’ guidelines and involves no deception. As the search engine guidelines are not written as a series of rules or commandments, this is an important distinction to note. White hat SEO is not just about following guidelines, but is about ensuring that the content a search engine indexes and subsequently ranks is the same content a user will see.

White hat advice is generally summed up as creating content for users, not for search engines, and then making that content easily accessible to the spiders, rather than attempting to game the algorithm. White hat SEO is in many ways similar to web development that promotes accessibility, although the two are not identical.

Black hat SEO attempts to improve rankings in ways that are disapproved of by the search engines, or involve deception. One black hat technique uses text that is hidden, either as text colored similar to the background, in an invisible div, or positioned off screen. Another method gives a different page depending on whether the page is being requested by a human visitor or a search engine, a technique known as cloaking.

Search engines may penalize sites they discover using black hat methods, either by reducing their rankings or eliminating their listings from their databases altogether. Such penalties can be applied either automatically by the search engines’ algorithms, or by a manual site review.

One infamous example was the February 2006 Google removal of both BMW Germany and Ricoh Germany for use of deceptive practices. Both companies, however, quickly apologized, fixed the offending pages, and were restored to Google’s list

→ No CommentsTags: SEO Introduction

Getting listings, Preventing listings

January 4th, 2007 · No Comments

Getting listings

The leading search engines, Google, Yahoo! and Microsoft, use crawlers to find pages for their algorithmic search results. Pages that are linked from other search engine indexed pages do not need to be submitted because they are found automatically. Some search engines, notably Yahoo!, operate a paid submission service that guarantee crawling for either a set fee or cost per click. Such programs usually guarantee inclusion in the database, but do not guarantee specific ranking within the search results. Yahoo’s paid inclusion program has drawn criticism from advertisers and competitors. Two major directories, the Yahoo Directory and the Open Directory Project both require manual submission and human editorial review. Google offers Google Sitemaps, for which an XML type feed can be created and submitted for free to ensure that all pages are found, especially pages that aren’t discoverable by automatically following links.

Search engine crawlers may look at a number of different factors when crawling a site. Not every page is indexed by the search engines. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled.

Preventing listings

To avoid undesirable search listings, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine’s database by using a meta tag specific to robots. When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed, and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. In March 2007, Google warned webmasters that they should prevent indexing of internal search results because those pages are considered search spam.

→ No CommentsTags: SEO Basic · SEO Introduction

Webmasters and search engines

January 3rd, 2007 · No Comments

By 1997 search engines recognized that some webmasters were making efforts to rank well in their search engines, and even manipulating the page rankings in search results. Early search engines, such as Infoseek, adjusted their algorithms to prevent webmasters from manipulating rankings by stuffing pages with excessive or irrelevant keywords.

Due to the high marketing value of targeted search results, there is potential for an adversarial relationship between search engines and SEOs. In 2005, an annual conference, AIRWeb, Adversarial Information Retrieval on the Web, was created to discuss and minimize the damaging effects of aggressive web content providers.

SEO companies that employ overly aggressive techniques can get their client websites banned from the search results. In 2005, the Wall Street Journal profiled a company, Traffic Power, that allegedly used high-risk techniques and failed to disclose those risks to its clients. Wired reported the same company sued blogger Aaron Wall for writing about the ban. Google’s Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.

Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences and seminars. In fact, with the advent of paid inclusion, some search engines now have a vested interest in the health of the optimization community. Major search engines provide information and guidelines to help with site optimization. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Yahoo! Site Explorer provides a way for webmasters to submit URLs, determine how many pages are in the Yahoo! index and view link information.

→ No CommentsTags: SEO History