May 2, 2008

Delete Twitter Direct Messages

*****

This is a hack.
This is not guaranteed to work.
Twitter may change their website at any time.
I am not responsible if something breaks or you decide to delete all your DM’s.
Nor am I responsible if your friends get mad at you for deleting your sent DM’s from their Inbox.

*****

That being said, I’d like to introduce my DM Whacker, DM Deleter, DM Sniper, whatever you want to call it I created a tool to delete your direct messages en-masse on Twitter. After the recent debacle regarding direct messages being exposed, I definitely saw a need amongst some Tweeters do delete their direct messages.

I need to thank @chris4403 who posted his awesome Twitter Translate bookmarklet recently. Were it not for that, I would not have had the motivation nor the codebase off which to build my first bookmarklet.

With that introduction, I’d like to point you in the direction of the new DM Deleter. Simply drag the link that’s in that page up to your Firefox or Safari bookmark bar, navigate to your direct messages, click the bookmark and select your options to delete your DM’s once and for all. The tool allows you to delete all of your messages, or just messages from certain friends.

My apologies for the additional link, but I just wanted to reinforce the point that this tool will delete your direct messages forever. So, use it with care.

Feedback welcome. The ability to delete sent messages will be added in as well soonhas been added in version 0.2.4.

April 27, 2008

Twitter Reputation Statistics

OK, I figure it’s time to throw my hat into the ring.

I’ve posted in the past about Twitter spam and I run what I think to be a pretty fun website about Twitter Stats, but there seems to be a lot of conversation recently about Twitter and the noise ratio.

Obviously, people are trying to figure out how best to use Twitter given its recent surge in popularity and accompanying spaminess. Louis Gray made a blog post about his noise ratio and Stowe Boyd followed up with a post about the noise ratio and conversational index, but there’s one thing that seems to be common across both these posts:

There is a super-fantastic problem in that both posts discuss one, one ratio!

That’s right - one ratio to describe the entire activity of Twitterites. One ratio to rule them all, one ratio to find them, one ratio to bring them all and in the darkness bind them.

OK, perhaps these posts were intended to be their own personal way of determining a proper reputation structure on Twitter, but there is so much more data available to play with. Shall we? Yes, let’s take a look at all the numbers we have to play with:

  • Friends
  • Followers
  • Favorites
  • Updates
  • Date joined Twitter
  • Number of updates over time
  • Number of updates in the past month vs. when the first joined Twitter
  • % of updates that contain links
  • % of updates that are replies
  • Number of mentions of the word “awesome”

These are just a few of the numbers that Twitter provides and while the noise ratio is a nice statistics, it is most definitely not a holistic means of providing a method by which to rate the reputation of a Twitter user. And there never will be such a means. Myself and @wardspan had a conversation this evening where we discussed the top three things we use to determine if we’re going to follow somebody. I think we only shared one in common of our top 3 and we tend to be pretty similar-minded. But we use Twitter for different reasons.

And it is with this post that I call out for a reasonable reputation system across our many services. Twitter is one such example, but there have been others in the past (yes, those other social networks) that have dealt with the same reputational issue, not to mention spam.

And it’s not getting better. I signed up for FriendFeed today and created a profile of my real self’s online activity. The scary thing is…I could have created the same profile for anybody else and the question to ask yourself is would anybody have known any better? In addition, in their case - does it even matter? Or are they redirecting their trust to the other systems they are using to generate their content.

Just imagine, if we could create a reliable reputation system across the services that we use to provide us with better and more interesting, targeted content on a daily basis. If only…

April 16, 2008

Addressing Twitter Spam Through Statistical Analysis

A brief update - top 3 things that can be done to help users weed out spam:

  1. Make the block functionality more accessible - did you find it underneath the “Following” legend?
  2. Provide basic stats about a user in the notification email - location, bio and some ratio information
  3. Use backend monitoring/analysis to `killall -9` spammer accounts (block ratio, usage trends indicative of automation, etc)

As with any social network, spammers appear to take advantage of the collective masses that are gathered and interacting with each other. This is no different on Twitter, where numerous people have complained recently about massive follows from spam accounts. These accounts typically take the form of a high following:friend ratio and a low number of updates. There is even a site devoted to Twitter spam, twitterspam.com. There’s quite a bit of other information we can examine, but let’s tackle this in order of the two main types of spam I’ve come across.

The first is embodied in the @castlebaths account. Statistics that indicate this as a possible spam account:

  • 20% of links in the first 20 updates are the same as the bio link
  • There are zero replies in the account (note: not unlike a new Twitter user)
  • There’s an average of 1.15 updates/follower
  • The users “Friends” account for 95% of the aggregate friends and followers

Now this account may very well be legitimate, but I doubt many people want to follow somebody on Twitter that is simply hawking a product and not contributing much beyond that. Taking these values and creating an aggregate score would probably score pretty high on the spam card.

Let’s take a look at another account, @kendra2. This account is a little bit more difficult to identify as spam through the numbers:

  • 5% of the urls in the first 20 updates are the same as the bio link (that’s one url for those not counting)
  • This account has actually replied to people
  • There are only 14 updates, but
  • The users “Friends” account for 95% of the aggregate friends and followers

This is an interesting account since it seems to be an actual person trying to interact, but the bio link is actually the telltale sign here - videochatonline is a webcam site and @kendra2 is obviously trying to bring traffic to that site. The numbers do not clearly mark this as spam, but the last two statistics seem to indicate this account has been created solely for the purpose driving traffic outside of Twitter. Other signs are the “pretty girl” avatar, bio link to a commercial site and potentially similar profiles.

As a Twitter user, what other statistics can I use to identify spam that Twitter (or somebody else…) might be able to provide?

  • # of my friends that _also_ follow the account
  • # of accounts without autofollow that are following the account
  • # of inactive accounts being followed by the new user
  • Are consecutive accounts being followed?

There’s also a number of back end statistics that can be utilized by Twitter such as unique IP addresses in use across large numbers accounts, clickstream rates and patterns and other similarities across multiple accounts. Reporting spam isn’t always useful, but observing the (generally predictable) behavior of spammers and the interaction of the users with those accounts is a step forward.

Is spam an easy problem? Obviously not or we wouldn’t have blog, email, trackback, comment and postal spam. Will there be false positives? Sure. However the numbers above can help in both the automatic identification of spam accounts and providing users with enough topical information to make smart decisions to help alleviate their frustration as well. Furnishing an easy means by which to report/block spam is also a necessary evil. Twitter has hummed along relatively under the spam radar until now, but it seems it has to accept that spammers will try to take advantage of its users. Giving users the power to identify and avoid spam through the use of statistics will hopefully make Twitter a fruitless source of successful spam.

February 12, 2008

Quick Argus3 Commands

This is going to be a quick post, mostly because I’m tired from working on that other site and I really need to get some sleep.

I’ve been doing some serious pcap analysis lately. You know the type…where you’ve dumped numerous pcap’s with tcpdump and the wonderful -C parameter. Being the type of guy that I am, I wanted to visualize the traffic I’d captured to identify what was going on. Here’s a few argus commands I used to get the job done. Note I’ve used back slashes (\) to separate the commands onto multiple lines

# Extract specific src mac addresses I'm interested in
for i in `ls ~/captures/pcap*`; do
  /usr/local/sbin/argus -mAJZRU 256 -r $i -w src_macs.argus - \
  ether src 00:00:00:11:22:33 or ether src 00:00:00:33:22:11;
done

Fantastic - now I’ve got an argus data stream that contains traffic solely from a mac or two I was interested in.

# Now let's take a look at top usage for each IP address
racluster -r src_macs.argus -m proto saddr dport -w - | \
  rasort -m saddr pkts -s saddr dport pkts | more

Now that we’ve manually looked through that data and found the top ports (argus used to have a -topN option, but I couldn’t seem to find it) let’s draw some nice-looking graphs. This splits the graph out into directories by date and generates graphs in each directory representing traffic for each particular mac address.

# For each mac address, generate daily usage for the "interesting" ports we saw above
macs="00:00:00:11:22:33 00:00:00:33:22:11"
ports="23 53 80 139 389 443 445 3389 1521"
filter_string=`echo $ports | sed 's/[[:digit:]]*/dst port & or/g' | sed 's/ or$//'`

for mac in ${macs}; do
  rasplit -r src_macs.argus -M time 1d -w "archive/%Y_%m_%d/${mac}.arg" - \
    "(${filter_string}) and (ether src ${mac})";
done

find archive -name *.arg | xargs -I {} \
  ragraph pkts dport -M 1m -r {} -fill -stack -w $(dirname {})/`basename {} .arg`.png

It’s not perfect and it took me quite a while to understand the intricacies of argus (-w - is different from just not specifying an output file, for example), but it’s definitely a start down the road.

January 27, 2008

Twitter Stats/Tweet Stats/Man am I Tired!

So nearly a month to the day after releasing my Twitter Stats perl script, I finally made a webified version. You can check it out over at TweetStats.com.

This was really more of an engine for me to get up to speed on Ruby on Rails (RoR) than anything. I’ve been wanting to play with Ruby for a while now, but really just didn’t have the motivation. I’d like to keep making regular updates to the site as there are several features I’m hoping to add such as dynamic graphs that allow you to zoom in on your timeline, an auto-follow bot that will keep your stats constantly up-to-date, and a widget you can put on your site if you so desire. Although I’ve enjoyed working on the site, I would like to relax for a little bit.

The experience with RoR has been fairly pleasant. It’s a well thought-out framework that lends itself to quick and efficient development once you wrap your head around the model. The only downside, and really what kept me from being able to put the site up faster, was the usage of some gems like BackgrounDRb. While a great idea, a fractured development community and somewhat buggy coding gave me many a headache in the past week, ultimately leading to two redesigns of the backend code and consumed two weekends of my life. I won’t say the code is perfect, it’s far from it. But it’s been fun and hopefully people enjoy the site.

The Internet is an invaluable resource and you can find many links I used via my del.icio.us rails tag. The most useful by far was Dominiek.com’s Building a .Com in 24 hours and you will see similar design patterns on my site. I had this page open nearly the entire time I was developing the site. It was a great aid throughout the entire process.

Finally, a special thanks to somebody that sat up with me throughout the night as I muttered away coding to myself.

January 12, 2008

Adium + Quicksilver Script

For whatever reason (perhaps it’s the slowness of Twitterific, or the lack of any other application to satisfy my Twitter-craving) I wanted to be able to Tweet from Quicksilver. A quick Google led me to an AppleScript to Tweet from Quicksilver, but alas…it was a year old and not functional.

So, two hours later - allow me to present an updated version of the script:

using terms from application "Quicksilver"
	on process text im_text
		repeat with im_delimiter_position from 1 to (length of im_text)
			if character im_delimiter_position of im_text = ":" then exit repeat
		end repeat
		set im_contact_name to characters 1 thru (im_delimiter_position - 1) of im_text as string
		set im_message to characters (im_delimiter_position + 2) thru (length of im_text) of im_text as string
		tell application "Adium"

			set user to first contact whose (status type is available and (display name starts with im_contact_name or title starts with im_contact_name))

			if not (exists (chats whose contacts contains user)) then
				tell account of user to (make new chat with contacts {user} with new chat window)
			end if

			send (first chat whose contacts contains user) message im_message
		end tell
		return nothing
	end process text
end using terms from

The procedures for installation and usage are the same.

  • Paste the script into Script Editor and save it in ~/Library/Application Support/Quicksilver/Actions as Send As IM.scpt.
  • Cmd+Space, period, “Contact name: Message”, tab, S, enter.

I now return you to your regular Twedule.

December 27, 2007

Twitter Stats

Final Update to This Post For those not following along at home, I finally took Twitter Stats to the next level and released a webified version over at TweetStats.com. I, somewhat unfortunately, had to go with TweetStats as twitterstats.com was already taken. :( I made a post about it here and you can see an example of my stats on the site on the graphs page.

So I’ve been a user on Twitter for a little over a year, but it wasn’t until recently when I hit 2000 tweets that I wanted to see what my Twitter history looked like over that period. Ever being the statistics nerd, I pulled down all of my tweets and using a combination of curl, sed, grep, Excel, and Numbers, managed to generate some nice graphs.

Being the automation weenie that I am, I eventually hacked together a perl script that did everything except paste the data into Numbers.Although I won’t post it here (because I think the Twitterocracy would have a cow with how it’s implemented), you can DM me your email and I’ll send you the code and instructions. See below - bugs be damned, I’ve made it publicly available.

Basically, the script pulls down all your tweets and stores them in an csv file. It then runs some statistics on the csv file and then copies the resulting stats to the OS X clipboard to paste into each table within Numbers. If run with a pre-existing tweets csv file, the script will calculate the different between your current status count and only download the pages necessary, thus saving the Twitter servers from some bandwidth. ;)

For the record, here are mine. :-D
@dacort's Twitter Stats

Update Thanks to kosmar for pointing out that I can actually do this entire thing without your password. Head => Wall. I’ll be updating it accordingly and hopefully even making a webservice out of it soon. :)

In addition to not needing your password, the script should also adjust the times for your tweets to whatever the system time is where you run the script.

Another update: I’ve also posted the script on my site and you can download it here: twitter_stats.zip. Feel free to contact me with any questions via twitter or web.dpc at dcortesi . com.

Many people have noticed a large after-lunch spike around 2pm. At least for me, this was due to Twitter being down most of the morning one day and then tweeting like crazy when it came back online.

BUG FOUND AND SQUASHED

There was a small bug that cropped up after I switched the script to not require your password. It accounts for the odd “[Tuesday|January|2pm] Peak” that people were seeing. This bug has now been fixed and an updated script is available. My apologies.

Unfortunately, if want the most accurate tweets, you will have to rm your csv file and run the script again.

Date::Calc aka failure on line 13
Some of you (on Tiger?) may be missing the Date::Calc module that I use to figure out weekdays. Although I tried to use as few perl modules as possible, this one was essential. Use the following command (thanks to a couple twitter peeps for the reference) to install:
sudo perl -MCPAN -e 'install Date::Calc' and keep hitting ‘y’. ;)

Final (hopefully) Update on this page as it’s getting messy.

For those of you not on OS X with Numbers, there are a few options:
@bck webified my code (w00t): Twitter Stats
@mmc decided to use gnuplot: Twitter Stats in SVG Using GNUPlot
@cbarrett modified it to utilize the Google Chart API: Twitter Stats with GChart
@kejadlen reverse engineered my original script to Ruby: Twitter Stats in Ruby

I still want to write my own webified version (Google Chart aesthetics leave a little bit to be desired…), but I have yet to settle on an option that I like.

September 6, 2007

Quick Comparison of Numbers vs. Excel/Google Spreadsheet

Being the Apple neophyte that I am, I’m always looking to compare my ability to achieve things in the Mac world as opposed to other worlds. My challenge this evening - creating a spreadsheet in iWork ’08’s Numbers application from an HTML table.

Source Data: Music sources from episodes of This American Life. I love the background music on TAL and was in search of some of the songs. I this metafilter post, which ultimately led me to the Wayback Machine as the TAL site has been updated since that post. I then wanted to get the list into a format I could use for some scripting goodness in the future.

This is a stunningly simple task in Microsoft Office.

  1. Open the web page in your browser of choice.
  2. Highlight the table in question.
  3. Open Microsoft Excel.
  4. Paste - the table is automatically formatted into two columns

It doesn’t get much easier than that.

I tried the same thing in Numbers tonight, figuring it would be no more difficult. Well, it was. Instead of pasting a table, it just pasted the two columns as one, each column one under the other. I looked around for a way to convert data to columns. No luck. I tried to paste it into Pages, do some find/replace magic and export to a csv file. Too much trouble. I tried copying the HTML source. Definitely the wrong thing to try. I tried searching the help. Yea right!

I finally got so frustrated (and I don’t have Parallels reinstalled yet as I reinstalled my MacBook today), that I opened Google Spreadsheets and decided to see if I would have any luck there. Using the same process as in Excel, I instantly had exactly what I needed in Google Docs. Color me impressed! Not only that, I was able to extract the data I wanted using the provided formulas. This really makes me want to implement that Firefox Google Docs encryption plugin I’ve been thinking about…

In any case, for the curious - here’s the finished product. Now I just need to work on that script. ;-)

I’m somewhat disappointed in Numbers. There was another simple task I was recently trying to accomplish where Numbers just wasn’t up to the task. It’s a great app, but it definitely needs a few more features before I can use nothing else.

June 20, 2007

Postfix and Spam Blacklists

I came across an article yesterday about blocking spam with Postfix using blackhole lists. This is something that I haven’t previously set up and I get a decent amount of spam, so I figured it couldn’t hurt. I followed the directions and in the past day and a half that’s in been in place, over 700 spam emails have gotten blocked…not too shabby! :) 173 still got through, so that’s a pretty darn good ratio in my opinion.

June 19, 2007

iPhone Capabilities - Potential for Eavesdropping?

I just saw a post about some of the browser capabilities of the new iPhone, and there was one feature that caught my eye:

- new telephone links allows you to integrate phone calls directly from your webpage. remember this is only on safari.

The first thing I thought of was, “Wow, I hope that you can’t somehow execute those links automatically via JavaScript…”. Can you imagine if you browse to a page and your iPhone automatically dials the number of an attacker and listens in on a conversation you might be having? Combine an XSS vulnerability on a high-profile website and a couple of high-profile CEO’s that we _know_ have an iPhone and you could get some pretty interesting dirt!

That would be kind of bad…

Update: Hehe, see.