University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

Guest Talks

title

Efficient pairwise multilabel classification

speaker

Ph. D. Loza Mencia, Technische Universität Darmstadt, Germany

date & place

Friday, 25.05.2012, 12:00 h
Room H 306

abstract

In this talk, I will give an overview of methods for efficient learning from multilabel data developed in our group with a special emphasis on large-scale label output spaces. A prototypical application scenario for multilabel classification is the assignment of a set of keywords to a document, a frequently encountered problem in the text classification domain. With upcoming Web 2.0 technologies, this domain is extended by a wide range of tag suggestion tasks and the trend definitely is moving towards more data points and more labels. In contrast to the common approach of training one classifier for independently predicting the relevance of each class, we focus on the pairwise decomposition of the original problem in which a decision function is trained for each pair of possible classes. The main advantage of this approach, the improvement of the predictive quality, must be contrasted with the main disadvantage, the quadratic number of classifiers needed (with respect to the number of labels). This talk will present a framework of efficient and scalable solutions for handling thousands of labels despite the quadratic dependency. The challenging EUR-Lex text collection with almost 4000 labels and 20000 documents serves as a testbed for the proposed approaches.