Thursday, December 11, 2008

Ranking Systems and Vote Spam

As the number of social networks increase, so does their importance for bloggers in particular and businesses in general to promote their content through them. It is critical to understand the metrics used to use social media effectively. One particular industry that has high stakes in this is the search engines as social media sources provide an effective alternative to traditional web search by directly connecting users with the information needs to users willing to share the information. For example, users can post questions or new items, and rely on other users to comment or rank the content (e.g., sites such as Slashdot or Digg) or rank the popularity of users (like Twitter). While the responses could be excellent, the quality could vary greatly. Hence, user feedback, such as voting, or rating the content, has become a crucial aspect of the effectiveness of the community as demonstrated by the paper, A Few Bad Votes Too Many? Towards Robust Ranking in Social Media[pdf] by Jiang Bian[1], Yandong Liu[2], Eugene Agichtein[2] and Hongyuan Zha[1]. From the abstract:

On line social media draws heavily on active reader participation, such as voting or rating of news stories, articles, or responses to a question. This user feedback is invaluable for ranking, filtering and retrieving high quality content - tasks that are crucial with the explosive amount of social content on the web. Unfortunately, as social media moves into the mainstream and gains in popularity, the quality of the user feedback degrades. Some of this is due to noise, but, increasingly, a small fraction of malicious users are trying to "game the system" by selectively promoting or demoting content for profit, or fun. Hence, an effective ranking of social media content must be robust to noise in the user interactions, and in particular to vote spam.

According to authors there are two main types of vote spam in social media: incorrect votes and malicious votes. The user who gives the votes may not be an expert to the topic thread and related responses, therefore it is likely that its votes are incorrect. In another case, some malicious users intend to promote some specific responses within the community of social media, and they attack the social media service by creating a thumbs up vote to specific posts or responses.

The objective of the research is to introduce a machine learning-based ranking framework for social media that integrates user interactions and content relevance, and that is significantly more robust to vote spam compared to a state-of-the-art baseline as well as the ranker not explicitly trained to handle malicious interactions (emphasis mine).

Current research and experiments strongly suggest that what once worked in manipulating social network results will not work any more.

[1] College of Computing, Georgia Institute of Technology
[2] Math and Computer Science, Emory University


Post a Comment