I was looking through the UTCS server logs, and noticed that my tech report, Naive Bayes vs Rule Learning in Classification of Email, (UTCS AI-TR-99-284) Is being downloaded 8-9 times per day. Right on. It’s been linked in some significant places, including Paul Graham’s Bayesian filtering research page, and the spam filtering FAQ.
It’s funny, the paper predates Graham’s A Plan For Spam by almost 3 years. The problem is that mine is a very simple paper, and explains little, except that Bayesian learning kicks ass on spam filtering. It’s not the earliest Bayesian spam-filtering work, but I think if I had decided to pursue that as my dissertation, I would (a) be the man when it comes to automated spam filtering, and (b) probably be finished or nearly finished with school. So why did I spend several more years screwing around looking for a topic in robotics? One reason is that the spam filtering stuff was basically an engineering project, and I was (and am) interested in cognition. Another, hidden but very significant reason is that the UTCS computing systems support folks have some killer spam filtering mojo on the mail server, so I get hardly any spam. Necessity is the mother of invention. I didn’t get much spam, so I didn’t see the filtering of spam as a significant enough problem to warrant a dissertation.