Monday, December 29, 2008

Options for Celebrating the New Year

Have you ever spent the last day of the year on-line? Instead of partying, hitting the streets or freezing your privates out in the snow (this obviously rules out the people of the southern hemisphere), have you tried celebrating the New Year in front of your computer in peace and tranquility?

Well, before you call the men in white coats let me explain. It happened to me once, about 5 years ago if my memory serves me right. Surprisingly, I was not alone. There were quite a number of souls who were in need of psychiatric help apparently. I will go on and say - at the risk of being called anti-social, cynical, narcissist or a similarly appropriate term for the occasion - I enjoyed it very much. For reasons I will not go into, I had to stay home and rather than watching TV and testing the alcohol barrier, I punched the keys until the early hours of the day while the fan was humming gently in the background, trying to cool the system.

This is for those who will spend the evening of December 31 in front of their computers for whatever reason. You are not alone! Been there, done that.

Tuesday, December 23, 2008

A Christmas Post

I sometimes come across in blogs an add-on which shows the mood of the owner, the song she listens and the book she reads at that particular moment. Now that Christmas is due and the New Year is imminent (apparently the economic crisis affected my vocabulary), if I had a similar plug-in, mine could as well read:
  • Mood: Gloomy
  • Listening: Sound of Silence
  • Reading: For Whom the Bells Toll
Still, in spite of ominous signs, tightened budgets, shrinking profits and accelerating unemployment, this is the time of year to hope, and to reflect. Capitalism, just like it does not distribute wealth equally, does not distribute peace, happiness and joy with a conscience.

Well, let me stop before this turns out to be anything but a Christmas post. I wish you all a merry Christmas and a happier New Year.

Friday, December 19, 2008

Most Popular Searches of 2008

It is customary to compile a list of events when the New Year is near. As most of you know, Google makes a similar list of popular searches every year on a country by country basis. I have taken the liberty of picking the most interesting queries among them. Here they are:
  1. "my" from Australia - This is definitely my number one. When you don't have a mirror, what do you do? Yes, you ask Google.
  2. "qq" from China - It probably means something in Chinese; then again, it might not.
  3. "you" from Chile - Another intelligent search item. Were they talking to Google bot? It was also the number eight of the Colombians and number nine of the Spanish. I am beginning to suspect you is a nick for a new drug.
  4. "danmark" from Denmark - Apparently the Danish like to see how many times their country is indexed by the search engines.
  5. "google" from Germany - I have nothing to say.
  6. "hong kong" from Hong Kong - The Danish were not alone.
  7. "uomini e donne" from Italy - Long gone are the days when godfathers gave the children their names. There is now a new godfather in town: Google.
  8. "네이버" from South Korea - The most popular search. I have no idea what it means; your guess is as good as mine.
  9. "trademe" from New Zealand - I am curious, now. I will go have a look.
  10. "104" from Taiwan - At last, we are getting smarter. What could it be? A number, a class, a flight, a room, a superman?
This concludes my top 10 searches of the year. Happy searches for 2009!

Wednesday, December 17, 2008

The Boy Who Cried Wolf

You surely know Aesop's fable "The Boy Who Cried Wolf." The tragic ending of the shepherd boy and the flock is used to teach us not to lie. However, this short story has always irritated me: there is something wrong with the moral of it.

I do not know your experience, but here, in this lovely corner of the world, shepherds hardly own flocks. They are usually hired hands and the protagonist in the fable is also depicted as one. OK, the boy lied and got eaten, but so did the flock. Now, whose flock was it? By not believing him, whose property perished?

Whenever I hear cries of help from industries in trouble after the recent economic downturn, I can not help but remember Aesop's tale. Some scream and shout, some refuse to believe because they have been lied and ripped off so many times. Before rejecting help to those who are presumably in need, it is worth thinking about what happens to the flock.

Sunday, December 14, 2008

How Not to Make Money on Line

With no intention of upsetting numerous bloggers - some of which I closely follow and benefit - who advise on how to make money on line, Nassim Nicholas Taleb's The Fourth Quadrant: A Map of the Limits of Statistics that was recently published in Edge can be an eye-opener. In his analysis of the latest crisis of the banking system, Taleb says,

When I was a quant-trader in complex derivatives, people mistaking my profession used to ask me for "stock tips" which put me in a state of rage: a charlatan is someone likely (statistically) to give you positive advice, of the "how to" variety.

Go to a bookstore, and look at the business shelves: you will find plenty of books telling you how to make your first million, or your first quarter-billion, etc. You will not be likely to find a book on "how I failed in business and in life"—though the second type of advice is vastly more informational, and typically less charlatanic. Indeed, the only popular such finance book I found that was not quacky in nature—on how someone lost his fortune—was both self-published and out of print. Even in academia, there is little room for promotion by publishing negative results—though these, are vastly informational and less marred with statistical biases of the kind we call data snooping. So all I am saying is "what is it that we don't know", and my advice is what to avoid, no more.

You can live longer if you avoid death, get better if you avoid bankruptcy, and become prosperous if you avoid blowups in the fourth quadrant.
I used to give the same mathematical finance lectures for both graduate students and practitioners before giving up on academic students and grade-seekers. Students cannot understand the value of "this is what we don't know"—they think it is not information, that they are learning nothing. Practitioners on the other hand value it immensely. Likewise with statisticians: I never had a disagreement with statisticians (who build the field)—only with users of statistical methods.

I would like to draw your attention especially to the unpopularity of "publishing negative results" and "the value of what we do not know". Writing about how we failed is equally important as our success stories and it is information.

To give you a taste of the article, let me quote another part:

There are two classes of probability domains—very distinct qualitatively and quantitatively. The first, thin-tailed: Mediocristan, the second, thick tailed Extremistan. Before I get into the details, take the literary distinction as follows:

In Mediocristan, exceptions occur but don't carry large consequences. Add the heaviest person on the planet to a sample of 1000. The total weight would barely change. In Extremistan, exceptions can be everything (they will eventually, in time, represent everything). Add Bill Gates to your sample: the wealth will  jump by a factor of >100,000. So, in Mediocristan, large deviations occur but they are not consequential—unlike Extremistan.

Taleb's essay is a good read for bloggers and economists alike.

Thursday, December 11, 2008

Ranking Systems and Vote Spam

As the number of social networks increase, so does their importance for bloggers in particular and businesses in general to promote their content through them. It is critical to understand the metrics used to use social media effectively. One particular industry that has high stakes in this is the search engines as social media sources provide an effective alternative to traditional web search by directly connecting users with the information needs to users willing to share the information. For example, users can post questions or new items, and rely on other users to comment or rank the content (e.g., sites such as Slashdot or Digg) or rank the popularity of users (like Twitter). While the responses could be excellent, the quality could vary greatly. Hence, user feedback, such as voting, or rating the content, has become a crucial aspect of the effectiveness of the community as demonstrated by the paper, A Few Bad Votes Too Many? Towards Robust Ranking in Social Media[pdf] by Jiang Bian[1], Yandong Liu[2], Eugene Agichtein[2] and Hongyuan Zha[1]. From the abstract:

On line social media draws heavily on active reader participation, such as voting or rating of news stories, articles, or responses to a question. This user feedback is invaluable for ranking, filtering and retrieving high quality content - tasks that are crucial with the explosive amount of social content on the web. Unfortunately, as social media moves into the mainstream and gains in popularity, the quality of the user feedback degrades. Some of this is due to noise, but, increasingly, a small fraction of malicious users are trying to "game the system" by selectively promoting or demoting content for profit, or fun. Hence, an effective ranking of social media content must be robust to noise in the user interactions, and in particular to vote spam.

According to authors there are two main types of vote spam in social media: incorrect votes and malicious votes. The user who gives the votes may not be an expert to the topic thread and related responses, therefore it is likely that its votes are incorrect. In another case, some malicious users intend to promote some specific responses within the community of social media, and they attack the social media service by creating a thumbs up vote to specific posts or responses.

The objective of the research is to introduce a machine learning-based ranking framework for social media that integrates user interactions and content relevance, and that is significantly more robust to vote spam compared to a state-of-the-art baseline as well as the ranker not explicitly trained to handle malicious interactions (emphasis mine).

Current research and experiments strongly suggest that what once worked in manipulating social network results will not work any more.

[1] College of Computing, Georgia Institute of Technology
[2] Math and Computer Science, Emory University

Wednesday, December 10, 2008

On the Metrics of Social Networks

Less and less things in life surprise me any more. But this new craze of "let's all follow each other on [insert your favorite network here]" has amazed me. Apparently there is still room for surprises and I have gladly taken it as "I am not that old after all." Fine! Let us dissect and analyze this 'following phenomenon'.

What makes the number of followers valuable as a metric? For instance, Matt Bacak of Twitter fame claims he has so many followers that he is the third tweeted??? man in the Tweetland. He thinks it is valuable so he markets it:

First Facebook, now Twitter. The Powerful Promoter, Matt Bacak, has taken himself to the top of the social media networks yet again, this time beating out 99.9% of the fastest growing site's members.
Turn your income-generating ideas into handfuls of cold hard cash.
By Matt Bacak, the Powerful Promoter and author of Powerful Promoting Tips newsletter. "If I could show you a proven, but little-known system to tap into your niche market, bring in more leads, sell more product and explode your Internet sales, would you be interested?" I'm not talking about some old ideas you already heard before. I'm talking about closely guarded secrets that I've only shared with a few select people.

First off, I must confess that reaching such numbers is a remarkable event on its own although some may argue how empty his life should have been for he had found the time to do all that tweeting and facebooking. But what does that figure say without the followers' data? Suppose 1000 of his followers also follow 500 others, another 1000 follow 200, third 1000 batch 100, so and so forth. How do you follow 100 people, let alone 1000? Has it any significance?

Unlike computers, we humans have tragically low thresholds. We can read limited number of books in a day, watch three or four movies in a row, etc. Likewise, we can visit 10-15 blogs and have 20 or maybe 30 friends that we communicate regularly. It is not that we do not want to do more, it is only that much we can achieve with our limited abilities in a shell of flesh and bone. So the significant segment of those followers is the ones following less than say, 30 people. These are the probably real followers of Mr Bacak.

I hope you have found my argument plausible. If you have, then you can bet the management of those social networks also agree with you, too. They will filter out those inflated numbers and track who really follows who and how many. After all, advertisers and marketing companies are not in the business of distributing cash for nothing.

Hence, we have two metrics, the one you see, a raw but inflated number which does not say much, and a hidden but real one. So, stop wasting your time and do something. Write a post or, I don't know!

Tuesday, December 9, 2008

Uploading Favicon to Blogspot Blogs

After visiting hundreds of sites, all with their shiny favicons displaying in your browser's address bar, you decided to use your own favicon in your Blogger/Blogspot blog. You designed and polished it and now what? Here are the steps you should follow:

1. Upload your newly created favicon to a free picture host and take note of the URL.

2. Visit your site and check your HTML source code by pressing CTRL+U (Firefox users).

3. Copy all the code after the opening head [head] tag up to the opening title [title] tag:

[some script]...[/script]
[meta content=...]

4. Back up your template.

5. Go to your dashboard and choose "layout", and then "edit html"

6. Delete this line from your template:

[b: include data='blog' name='all-head-content'/]

7. Paste the previously copied code in step 3 in lieu of the line you deleted in step 6.

8. Change the favicon URL to the one you got in step 1 like this:

[link rel='icon' href='yournewurl/favicon.ico' type='image/x-icon' /]

9. Escape the ampersand twins "&&" in the JavaScript adding "amp;" after each one without the quotes.

10. Save your template. If all goes well, you will see your favicon in the address bar of your browser.

All <> tags have been replaced with [] in this example in order not to confuse Blogger. If you have found this how to useful, design a favicon for me.

Controlling CSS Images in Blogger

Every now then we feel the urge to change our templates in our blogs. This can be out of necessity (we might need a bigger area to upload images), because we get bored with the previous template or upon discovering a new one which complements the topics we write about or our style.

Regardless of the platform you use, be it Blogger, Wordpress, Evolution etc, switching to a new template is trivial. But, unlike for instance Wordpress where the images come in its own folder with the template, Blogger images are usually stored in free picture hosts. Pictures hosted at such places can cause you trouble in future:
  • Uploader's account can be deleted for any reason.
  • Such hosts often impose bandwidth restrictions and you can suddenly see warning messages in your blog, or no images at all.
  • Template writer can accidentally delete those images.
A good way and a neat trick is writing a post like this, uploading all those CSS images to your Picasa album and change the image addresses in your stylesheet at your convenience later. This way, not only you will have a back up of them, you can even improve the speed of your blog.


Sunday, December 7, 2008

What a Blogger can Do

Analyzing the crisis of journalism and whether blogging or independent on-line journalism can take the place of media reporting as we know it or not, are not easy issues to tackle. The difficulty stems from the fact that,
  1. public's right to access to information,
  2. ensuring public's safety,
  3. personal and privacy rights,
  4. safeguarding the bloggers and anonymity,
  5. upholding copyright and patent laws
 are all intertwined in a beautiful mixture of a soup we call Internet. Since the topics are huge and will provide enough material to write for many years, I am going to start at a random point. Unlike many though, I do have a proposition to solve most of the problems of today, but to present it gracefully requires time; so it will have to wait.

Today, I would like to give three seemingly unrelated news and want you to focus on not who is right or wrong but on the mechanics of them, i.e. how things operate or unfold.

First, we have the report of Committee to Protect Journalists, telling us 45 percent of all media workers jailed worldwide are bloggers, Web-based reporters, or on-line editors. On-line journalists represent the largest professional category for the first time in CPJ's prison census. CPJ's survey found 125 journalists in all behind bars on December 1, a decrease of two from the 2007 tally which does not include the missing and the abducted. Here, we see, 125 people have been picked up from their homes and taken somewhere (some locations are unknown), no questions asked.

Secondly, De Beers vs. The New York Times or rather De Beers vs., the domain name registrar case. De Beers, the South African diamond conglomerate, upon seeing a fake ad on a brilliant spoof of the New York Times, has attempted to shut down the site by putting pressure on what is often the weakest link in the on-line speech chain: the domain name registrar. The accusation? Trademark infringement.

Thirdly, we see Blogger deleting entire posts from music bloggers' websites without warning or adequate explanation. Rather than exposing itself to unnecessary risks by providing a free medium for people to express themselves, a financially strong company wipes away blogs.

So, that concludes our tour, the bloggers, the name registrar, the hosting company. Now, ask yourselves this question: Can a blogger or independent on-line journalist do much?

Saturday, December 6, 2008

Getting the Most from Social Networks

I have partially covered some of the social networks you can use. Now, let us focus on making the most of them. I will recommend a slightly different strategy for you to follow. After you have made your own experiment and decided on which networks you will concentrate your efforts, here are some tips for you:
  • Do not rush submitting your own posts
Do not try to game the system in vain. Instead, give your readers the opportunity to bookmark and/or submit them. This will be a new experiment with which you will measure what percentage of your subscribers take the time to bookmark and share your posts.
  • Think how you can improve the submission rates
Most probably, the initial results will be discouraging, that's good! Now, reread your posts and note how you could have written them better, especially the titles. Check if your bookmark links function properly and if they are clearly visible. Do you encourage your readers to share?
  • Use comments to your advantage
When answering a reader's comment, use your notes (see above) to add the things you forgot. Write a follow up post if necessary. This has nothing to do with social networks but it is a solid advice.
  • Traffic is NOT conversion
Although it is gratifying to see your article received nn many diggs, reddits, sphinns or whatever, it is a poor indicator of how many people actually read your posts. Do not let glamor and fame blind you. Check your logs!
  • Do not let networks steal your traffic
Search engines, especially Google, place great emphasis on freshness. If your article is relevant to the query, you can rest assured that you will be in the top 30 of the search engine results pages (SERP's)... for a day or two, I am afraid. Do not spoil it by submitting your articles early. You do not have the authority and PageRank to compete against social media sites.
  • You can submit your article after 2 or 3 days
If nobody has done it yet, of course. Now your article is not fresh and probably lost its ranking in SERP's. It is time to use the network's power to your advantage. For a relevant query, they will have the muscle to compete for the top 30.
  •  Rewrite the summary part
Most networks have a summary or comment area after the URL and title box. Use your notes and reword the summary. You can even include something you have completely forgotten like "... while it provides an insight to the mechanics of social networks in general, it fails to address the issue of ..."
  • Try to write a new article before submission
It is good policy to have a new article on your front page. When people come to read your article from networks, they may click on home/main page, to see if there is anything new, but rarely anyone will check previous posts or your archives. Try not to disappoint them.

Marketing yourself is easy, right?

Thursday, December 4, 2008

Gate Peepin' and Misspelling Generator

Linda Hilfling, with her project Gate Peepin' and the Misspelling Generator, will be among the speakers of Speaking out Loud symposium of Netherlands Media Art Institute, to be held on December 18, 2008.

Linda works with the premises of participation and public spaces within media structures, with a focus on means of control (codes, organization and law) and their cultural impact. Her artistic practice takes the form of interventions reflecting upon or revealing hidden gaps in these structures.

Initially designed and coded in python and bash by her and also available as a Firefox extension thanks to Erik Borra, the Misspelling Generator intervenes directly within the Google search engine, allowing users to take advantage of the informational gray-zone of misspellings. And it does exactly what it claims:

Each query typed into the normal Google search-box will generate misspellings inserted above the normal Google results – similar to Google’s 'Did you mean', but now with 'Have you tried' instead. When hovering the mouse over the links, you can see the number of search results for each misspelling. Clicking the link will redirect you to the Google page with the results for that specific misspelling. It is a useful tool for creating simple cryptography, circumventing specific cases of censorship, or in general as a means of accessing the 'gray' side of the Internet, which otherwise is isolated by the rigid structures of 'corrective' info-culture regimes on search engines like Google.

Intrigued? Then, a new adventure awaits you, casual user and search engine optimizer alike. Take a look at the 'uncorrected' side and hidden layers of the Internet.

Tuesday, December 2, 2008

Tug of War Between Virtual and Real

Devil's Advocate reporting:
Time and again I read news about a cryptically named organization introducing a new device, technique or mechanism having an equally cryptic name that will limit or restrict unwanted use of a product throughout the Internet, and soon after, time and again, I am informed that same mechanism is rendered useless by a crack. For most, it looks like a game of hide and seek, played by thieves and the police, good and bad or you name it. Is it really so?

We often forget that this game is played in cyberspace, between parties with views wide apart. Back in 1964, Marshall McLuhan wrote in Understanding Media,

The telephone: speech without walls. The phonograph: music hall without walls. The photograph: museum without walls. The electric light: space without walls. The movie, radio and TV: classroom without walls. Man the food-gatherer reappears incongruously as information-gatherer. In this role, electronic man is no less a nomad than his Paleolithic ancestors.

He seems to stop just short of writing "Cyberspace: reality without boundaries,"[1] and detailing, 30 years before William Gibson, the role of the cybercowboy. As virtual reality theorist Marianne Trench[2] notes,

When William Gibson's visions were published, they struck sparks in the real world. Scientists and hackerture they couldn't wait to build... Never before had science fiction literature determined the way people thought and talked.

And hits the bull's eye. On one side we have the scientists and hackers who tried to build the Internet to realize a vision, William Gibson's vision of a new universe, a parallel universe created and sustained by the world's computers and communication lines, where the tablet becomes a page becomes a screen becomes a world, a virtual world, a common mental geography, built, in turn, by consensus and revolution, canon and experiment, the realm of pure information.[3] A place where our subconscious will emerge, our alter ego will transcend its physical boundaries, an extension of our selves. A place where information wants to be free.

On the other side we see the business, big and small, that belatedly discovered its potential, that wants to make it a utility, a huge on-line shop, an extension to their distribution channels, with rules and regulations imported from the real world.

I am the Advocate, and I speak the truth! True; shopping on-line, accessing all the products and services of the world in front of your screen has its merit, especially if you are in the middle of nowhere, in a small town, like me. But it is not the utility that makes me buy a high-speed connection. My mail was fully functional when I had that 14.4 kbps modem.

Hence, consider yourselves warned, whoever you are. Otherwise many will quit the Internet to join the hyper Internet.

[1]Mick Doherty, Rensselaer Polytechnic Institute, Troy, NY.
[2]Marianne Trench and Peter von Brandenburg, producers; 1992, Cyberpunk. Mystic Fire Video: Intercon Productions.
[3]William Gibson, Neuromancer, 1984, Ace Books.

Monday, December 1, 2008

Optimizing Blogger for Speed

No, I am not obsessed with speed if you have mistakenly got the impression after seeing this post and how to design an efficient blog. I live in an unfortunate area with an Internet connection averaging around 8 Kb per second, which is 1/13 of those who live 10 kilometers to the east and west of me. The tel-co will supposedly make some infrastructure improvements only after January, 2009. Well, at least that is what they claim. I can handle a slow connection but it is really annoying to wait for my own blog to load for 40 seconds. So I decided to make it leaner for my own sake.

Before writing this post, I pulled down various statistics of the site (home page only) to help me improve a bit:

Total HTML: 19,899 bytes, compressed;
Total images: 53,785 bytes;
JavaScript: 230,649 bytes;
CSS: 8,417 bytes;
Total CSS imports: 4.

Looking at the above figures, it is apparent that there are only two areas I can make some improvements: JavaScript and CSS imports. I focused on cutting back the scripts without losing too much from functionality, leaving CSS imports to some other time.

There were nine of them before writing this post and the heaviest ones being:

blogger-widgets: 131,819 bytes;
jquery: 55,774 bytes.

It would be absurd to get rid of the widgets script as it provided core functionality to all blogger sites, so jquery had to go despite the fact CommentLuv was one of my favorite plug-ins (I am planning to reinstall it if Tel-co proves worthy of its promises, though). For some reason, the syncing of the comments did not work and the comments made so far disappeared. I will work on it, or in the worst case write them myself. Meanwhile, I encourage you to input your last post manually, as this is a great way to discover new and interesting content.

One site that you can check the speed of your blog is Web Page Analyzer. It will show you potential problem areas and thereby guiding you to provide your readers a better experience. I strongly suggest to all blogger users to take a look and make a few adjustments.

While going through all this trouble, I changed my template to a more stylish (hopefully) one. I hope you will like it.