Issues with Inbound Links as an Indexing Measure

At the time, I was one of the posters that suggested that inbound links can be an indicator of the overall quality of a website, as a large number of inbound links from varying websites in different locations could indicate that the site is viewed by other webmasters as a useful resource. I also did not, at the time, see any other methodology that could be used to effectively determine whether a site should be indexed other than inbound links. However, there are methods of generating links that aren’t purely organic, such as reciprocal links (which Google appears to be discounting as mentioned earlier) and forum posts.

“Could I Get Some Help, Please?”

In many cases, inbound links are generated by webmasters for content that is under construction or subject to review by other webmasters. It is a common occurrence in webmaster forums to see a forum discussion thread started by a webmaster with a hyperlink to a specific web page and a question about how to solve a specific problem. The link provided is a link to a page that is clearly incomplete and therefore of marginal use at best to the end user.

“This Site Is Useful”

Inbound links may also be generated by other webmasters who may link to websites for the purposes for providing a useful resource to their customers while being completely unaware of the search engine ramifications. For example, a site pertaining to environmental concerns may provide a hyperlink to an environmental conference. However, the environmental conference site owners may not necessarily be interested in search engine traffic due to their inability to handle more conference attendees, presenters, sessions, etc., and may merely be using the conference website as a means of communicating news and information about the conference.

Determining Webmasters’ Intent at a Glance

For the next scenario, I will use two websites that I have built for two clients of mine:

Site 1: Kastle Fireplace

Site 2: The Innovolve Group

It should be fairly obvious that Kastle Fireplace has an interest in being listed in search engines. There are a number of different inbound links, a large number of pages, and various signs of optimization. Upon looking at it, one could reasonably assume that the owners of this site are legitimately interested in being listed.

However, it is much more difficult to discern what the owners’ intent for the second site is. For the purpose of this article, I have intentionally removed the meta keyword and description tags to make the site resemble a website designed by a webmaster that may not be well-versed in search engine optimization, but may still have constructed a useful website.

A surface glance reveals that it is a six-page website with two PDFs and a Flash introduction. It does have content that a specific customer base could find useful. It has some inbound links pointing to it.

But does that mean it should be there? There are no meta tags. There is no robots.txt file. The site doesn’t have a lot of copy. There is nothing conclusive that would indicate whether or not the site owners would or would not want their content indexed, and therefore nothing that an algorithm would be able to use to effectively determine the site owners’ intent.

This puts a search engine in a rather precarious position: should the content be indexed until such time as it is indicated that no indexing should take place, or should the content not be indexed until such time as it is indicated by someone associated with the site that indexing should take place?

Kastle Fireplace does wish to be as fully indexed as possible by search engines, whereas The Innovolve Group isn’t concerned about search engine traffic at the present time. Google has guessed incorrectly on this subject and has partially indexed the Kastle Fireplace website (with legitimate content pages showing up in supplemental results) and fully indexed The Innovolve Group website.

Robots Exclusion Protocol Note

While it is correct that the Robots Exclusion Protocol could be used in each of the appropriate cases above to avoid accidental indexing of content that a site owner does not necessarily wish to have indexed, there are still a large percentage of webmasters and site owners who are unaware of its existence, and as a result may end up with content indexed against their will. This is a disservice both to site owners and to end users.

It should also be noted that the Robots Protocol in general is an informal standard, and has not been backed by an official standards body. The original protocol was established by consensus in 1994, and at no point in the future is its use guaranteed.

Because of these factors, the Robots Exclusion Protocol by itself is not the best solution.

06/12/2006

Alternative Webmaster Suggestions –>

Issues with Inbound Links as an Indexing Measure

“Could I Get Some Help, Please?”

“This Site Is Useful”

Determining Webmasters’ Intent at a Glance

Robots Exclusion Protocol Note

Permission-Based Indexing Navigation