AIIA 2007 START Conference Manager    

Text Categorization in non-linear semantic space

Claudio Biancalana and Alessandro Micarelli

The 10th Congress of the Italian Association for Artificial Intelligence (AIIA 2007)
Roma, Italy, September 10-13, 2007


Abstract

Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed by using a set of manually classified documents, i.e. a training collection. Term-based representation of documents has found widespread use in TC. However, one of the main shortcomings of such methods is that they largely disregard lexical semantics and, as a consequence, are not sufficiently robust with respect to variations in word usage. In this paper we design, implement, and evaluate a new text classification technique. Our main idea consists in finding a series of projections of the training data by using a new, modified LSI algorithm, projecting all training instances to the low-dimensional subspace found in the previous step, and finally inducing a binary search on the projected low-dimensional data. Our conclusion is that, with all its simplicity and efficiency, our approach is comparable to SVM accuracy on classification.


  
START Conference Manager (V2.54.4)
Maintainer: rrgerber@softconf.com