Jon Udell of InfoWorld and the O’Reilly Network is playing around with Bayesian classifiers for blog-post categorization. He didn’t have much luck in his first tries, but he also didn’t have many training examples. He’s concluded, quite rightly, that he needs a nice interface for sorting examples and training classifiers.
I have a different idea. Why not let the algorithm come up with the categories, not just the categorizations? I would start with a hierarchical, incremental algorithm like COBWEB, and build an interface with two parts: (1) a means for easilly handing a document to the system for categorization, and (2) a category tree browser, allowing easy browsing and viewing of the documents. This could be a part of someone’s own personal Google.