University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

PhD Program Spring School 2006


WordSpace - Visual Summary of Text Corpora

speaker Martin Hoefer
 
date March 09, 2006
 
abstract In recent years several well-known approaches to visualize the topical structure of a document collection have been proposed. Most of them feature spectral analysis of a term-document matrix with influence values and dimensionality reduction. In this talk we present a generalized approach by arguing that there are many reasonable ways to project the term-document matrix into low-dimensional space in which different features of the corpus are emphasized. Our main tool is a continuous generalization of adjacency-respecting partitions called structural similarity. In this way we obtain a generic framework in which influence weights in the term-document matrix, dimensionality-reducing projections, and the display of a target subspace may be varied according to nature of the text corpus.