Feature discretization and selection techniques for high-dimensional data

17 April 2012

Artur Ferreira IT/ISEL

High-dimensional datasets are increasingly common in learning problems, in many different domains, such as text categorization, genomics, econometrics, and computer vision. The excessive number of features carries the problem of memory usage in order to represent and deal with these datasets, clearly showing the need for adequate methods for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements for the storage of these datasets.

It is often the case that filter approaches are the only applicable option, on high-dimensional datasets, where wrappers and embedded methods can be too expensive. Moreover, some filter approaches are also computationally prohibitive for high-dimensional datasets. This talk addresses (supervised and unsupervised) efficient techniques for feature discretization and feature selection suitable for high-dimensional datasets. These techniques attain competitive results and can also act as pre-processors for more sophisticated methods (e.g. wrappers). A set of experimental results on microarray and computer vision datasets is discussed.



Artur Ferreira is adjunct professor at ISEL (Instituto Superior de Engenharia de Lisboa) and a PhD student of Electrical and Computer Engineering at IST-IT (Instituto Superior Técnico – Instituto de Telecomunicações), under the supervision of prof. Mário Figueiredo. He holds a MSc on Electrical and Computer Engineering by IST. His main research interests are data compression, pattern recognition, and machine learning.