Tooting my own horn: Bayesian Spam Filtering

I was looking through the UTCS server logs, and noticed that my tech report, Naive Bayes vs Rule Learning in Classification of Email, (UTCS AI-TR-99-284) Is being downloaded 8-9 times per day. Right on. It’s been linked in some significant places, including Paul Graham’s Bayesian filtering research page, and the spam filtering FAQ.

It’s funny, the paper predates Graham’s A Plan For Spam by almost 3 years. The problem is that mine is a very simple paper, and explains little, except that Bayesian learning kicks ass on spam filtering. It’s not the earliest Bayesian spam-filtering work, but I think if I had decided to pursue that as my dissertation, I would (a) be the man when it comes to automated spam filtering, and (b) probably be finished or nearly finished with school. So why did I spend several more years screwing around looking for a topic in robotics? One reason is that the spam filtering stuff was basically an engineering project, and I was (and am) interested in cognition. Another, hidden but very significant reason is that the UTCS computing systems support folks have some killer spam filtering mojo on the mail server, so I get hardly any spam. Necessity is the mother of invention. I didn’t get much spam, so I didn’t see the filtering of spam as a significant enough problem to warrant a dissertation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: