Wednesday, November 26, 2008

A Thought Experiment on Do Follow Comments

Einstein was a great scientist and without exception all physics students new to his work are introduced to his use of Gedanken Experiment or thought experiments. Gren Ireson, a lecturer at Loughborough University, UK, where his research interests include quantum philosophy, physics of sport and learning and teaching physical sciences, contests that a thought experiment has three requirements [PDF]:
  • It is carried out in the mind (however one cares to define 'mind').
  • It draws on experience.
  • It allows the experimenter to see what is happening (perhaps a better term to use than 'see' is 'imagine' or 'form a mental image').
 I see a lot of blogs announcing they do follow comments lately, so I will humbly use Einstein's technique to shed some light on do follow commenting today. Let us start with a definition first. Google's Matt Cutts speaks:

The nofollow attribute is just a mechanism that gives web masters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out), but nofollow on individual links is simpler for some folks to use. There's no stigma to using nofollow, even on your own internal links; for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level.

The rel='nofollow' attribute is an easy way for a website to tell search engines that the website can't or doesn't want to vouch for a link. The best-known use for nofollow is blog comment spam, but the mechanism is completely general. Nofollow is recommended anywhere that links can't be vouched for. If your logs analysis program shows referrers as hyper links, I'd recommend using nofollow on those links. If you have a wiki that anyone on the web can edit, I'd recommend nofollow on those links until you can find a way to trust those links. In general, if you have an application that allows others to add links, web spammers will eventually find your pages and start annoying you.

In layman's terms, if a link on your page has the nofollow attribute, search engines do not follow that link and do not consider it a vote for the site. In the absence of it, search robots will follow the link and count it as a vote for the target link from the owner of the page.

And now the experiment:

You are heading a team in charge of designing and calibrating the web crawler, i.e. the search robot or bot, the more popularly used word. Based on the bot's hard work, your algorithm arranges search results. It is so smart that it can even factor the links, i.e. a link from an authority site on a subject carries more weight than others.

A new trend appears, people start letting others comment on their pages without the nofollow attribute. So far, so good. The bot would eventually discover the existence of those any way. Maybe this way, indexing could even be faster. There is a catch, though. Your infamous algorithm considers each of those links as a vote, but here in this case, the candidates vote for themselves, not the other way around, and for multiple times.

What would you do?
  1. Nothing! The commenters will eventually get bored and everything will return back to normal.
  2. Modify the program so that it would follow the links but will not count them as a vote.
  3. Factor those links negative so that each link actually depreciates the value of the target site.
  4. Not follow those links, at all.
Let us go back to the requirements of the thought experiment. I fail to fulfill the second one, the experience. I have not designed a bot or a search engine algorithm before. Although I am pretty sure about what I would do if I had that know-how, I have not. So I am asking you.

8 comments:

CJ said...

I have designed spiders, search engines and algorithms, so in my eyes I would do the following:

1 - topic detection on all the blogs
2- topic detection on all the linked to by do follow
3 - Do they match topic-wise? If not I would depreciate the link. If they do, I would count it.

but, I would also:

Check my other variables in my digital fingerprint of each blog and assess authority. Authority links would be for me links coming from authority bloggers in the comment on the host blog. It gives me a nice way of assessing the authority of the host blog too.

The issue - assessing authority isn't easy, so that would be an area of research to consider. PageRank does not assess page authority or site authority based on content, but only on links, which is ultimately flawed from a holistic point of view.

Also, prior to the blog using Do Follow, was traffic much lower?

This is just an off-the-top-of-my-head list, of course it'd be a whole lot more complex if you actually had to do this.

Nice post!

Recent blog post: 10 free NLP tools for the SEO

Ivs said...

I love this thought experiment. Do follow seems to follow Newton's laws very well.

Recent blog post: Technology: Problems and Prospects

Archiver said...

Topic detection and digital fingerprint checking seem good ideas but don't you think it will be taxing on the crawler? I have almost always got the impression that search engines are lagging behind the content created. I honestly believe that at least half of the web is not indexed at any given time.

CJ said...

Actually 16% of the web is indexed, which isn't very much when you think about it, but this does include all the deep web that doesn't get touched much.

Not all of the things I listed get done by the crawler. The crawler picks up information and send it through, it gets dealt with by other applications and systems. The digital fingerprint is an extension to the index if you like.

It is hard to keep up to date with all new content but a fast and hard spider is not necessarily the one for all of the jobs. There are typically several spiders who all do different jobs.

Recent blog post:SEO ladies, fancy being a Syster?

Andy said...

I'm not to sure that it is that much of a worry for Google at the moment. In many ways their introduction of nofollow has been a triumph for them and has had the intended effect - made their job easier for them There is no evidence though that spam has decreased since 2005 - but lots of evidence that it has increased: http://akismet.com/stats/ Google seems to be hot though on relevant links, and I guess the subject association between the link and linked will grow more prominent as time goes on.
Recent blog post: Organise your link building campaign using Excel spreadsheets

Archiver said...

Google's very introduction of nofollow is a sign that they are worried or they will be, I think. And Akismet's stats can be taken as they were right to use a mechanism which would decrease spam to some extent.

I have never been shy about linking to sites with quality content. Here, however, by introducing dofollow comments, web site owners risk a potential penalty if they slip and spam gets indexed by search engines accidentally.

John Papers said...

Thanks this post..
keep writing your blog will be more attractive. To Your Success!

Archiver said...

Had to give a short! hiatus due to sickness. Will try to make it up this year. Thanks for dropping by.

Post a Comment