Addressing Twitter Spam Through Statistical Analysis
A brief update - top 3 things that can be done to help users weed out spam:
- Make the block functionality more accessible - did you find it underneath the “Following” legend?
- Provide basic stats about a user in the notification email - location, bio and some ratio information
- Use backend monitoring/analysis to `killall -9` spammer accounts (block ratio, usage trends indicative of automation, etc)
As with any social network, spammers appear to take advantage of the collective masses that are gathered and interacting with each other. This is no different on Twitter, where numerous people have complained recently about massive follows from spam accounts. These accounts typically take the form of a high following:friend ratio and a low number of updates. There is even a site devoted to Twitter spam, twitterspam.com. There’s quite a bit of other information we can examine, but let’s tackle this in order of the two main types of spam I’ve come across.
The first is embodied in the @castlebaths account. Statistics that indicate this as a possible spam account:
- 20% of links in the first 20 updates are the same as the bio link
- There are zero replies in the account (note: not unlike a new Twitter user)
- There’s an average of 1.15 updates/follower
- The users “Friends” account for 95% of the aggregate friends and followers
Now this account may very well be legitimate, but I doubt many people want to follow somebody on Twitter that is simply hawking a product and not contributing much beyond that. Taking these values and creating an aggregate score would probably score pretty high on the spam card.
Let’s take a look at another account, @kendra2. This account is a little bit more difficult to identify as spam through the numbers:
- 5% of the urls in the first 20 updates are the same as the bio link (that’s one url for those not counting)
- This account has actually replied to people
- There are only 14 updates, but
- The users “Friends” account for 95% of the aggregate friends and followers
This is an interesting account since it seems to be an actual person trying to interact, but the bio link is actually the telltale sign here - videochatonline is a webcam site and @kendra2 is obviously trying to bring traffic to that site. The numbers do not clearly mark this as spam, but the last two statistics seem to indicate this account has been created solely for the purpose driving traffic outside of Twitter. Other signs are the “pretty girl” avatar, bio link to a commercial site and potentially similar profiles.
As a Twitter user, what other statistics can I use to identify spam that Twitter (or somebody else…) might be able to provide?
- # of my friends that _also_ follow the account
- # of accounts without autofollow that are following the account
- # of inactive accounts being followed by the new user
- Are consecutive accounts being followed?
There’s also a number of back end statistics that can be utilized by Twitter such as unique IP addresses in use across large numbers accounts, clickstream rates and patterns and other similarities across multiple accounts. Reporting spam isn’t always useful, but observing the (generally predictable) behavior of spammers and the interaction of the users with those accounts is a step forward.
Is spam an easy problem? Obviously not or we wouldn’t have blog, email, trackback, comment and postal spam. Will there be false positives? Sure. However the numbers above can help in both the automatic identification of spam accounts and providing users with enough topical information to make smart decisions to help alleviate their frustration as well. Furnishing an easy means by which to report/block spam is also a necessary evil. Twitter has hummed along relatively under the spam radar until now, but it seems it has to accept that spammers will try to take advantage of its users. Giving users the power to identify and avoid spam through the use of statistics will hopefully make Twitter a fruitless source of successful spam.
Nicely done. Helps all Twitterers be aware of the problem and sort the spammers from the real accounts. This is becoming a serious (yet sadly inevitable) problem and each person needs to move to change it!
April 16th, 2008 at 12:49 amThanks for the analysis, that was great. It seems like everything it does come down to statistics and probability theory. It’s going to be interesting to see where you end up with this. Keep us posted.
April 16th, 2008 at 1:27 am[...] Damon Cortesi (@dacort, creator of TweetStats) has a great post today about statistical analysis of a Twitter Spammer. This is a must-read for anyone who is concerned about the rise of spam on Twitter. We’ve been looking at a lot of spammer profiles over the past few weeks, and Mr. Cortesi is absolutely correct - these profiles often have a very similar statstical “signature” in terms of their Following:Follower ratio, whether or not their updates include links back to their profile URL, etc. And he’s even looking at the percentage of Followers who followed via an auto-follow, which is a subject that we just posted about. Furnishing an easy means by which to report/block spam is also a necessary evil. Twitter has hummed along relatively under the spam radar until now, but it seems it has to accept that spammers will try to take advantage of its users. Giving users the power to identify and avoid spam through the use of statistics will hopefully make Twitter a fruitless source of successful spam. [...]
April 16th, 2008 at 6:44 amspam accounts often post a handful of tweets at almost the exact same time every day–prob’ly because the spamming scripts are run by cron jobs. they usually only use one interface–web, of course. unlike a real person, they will jump right in and start tweeting, and will continue at the same tweet volume. a real person starts out slow, and builds tweet volume as they build followers and have more people to interact with. so most spam tweeters will have very uneven tweetstats graphs, in almost every category…
April 16th, 2008 at 7:46 amGreat technique to weed out these Twitter spammers–I hate getting 10-20 new friend requests a day only to find out most of them are garbage spammers. I have tried to start discussions on “Get Satisfaction” and I suggest you all do the same–Twitter has a company page set up for complaints with employees reading and writing back. My recent topic I started is: http://getsatisfaction.com/twitter/topics/please_stop_the_twitterbot_army_twitter_spam_must_stop
April 16th, 2008 at 8:23 am[...] of situation and they may help prod the devs with some other ideas as to how to help prevent spam. http://dcortesi.com/2008/04/16/addressing-twitter-spam-through-statistical-analysis/ [...]
August 2nd, 2008 at 1:26 pm[...] about who you follow on social networks and you will reduce your intake of low-quality messages. In this great post by Damon Cortesi, he shares a strategy for identifying spammers and heavy self-promoters in order to avoid them. Are [...]
April 27th, 2009 at 11:53 am