abstract |
In recent years several well-known approaches to visualize the topical structure of a document collection have been proposed. Most of them feature spectral analysis of a term-document matrix with influence values and dimensionality reduction. In this talk we present a generalized approach by arguing that there are many reasonable ways to project the term-document matrix into low-dimensional space in which different features of the corpus are emphasized. Our main tool is a continuous generalization of adjacency-respecting partitions called structural similarity. In this way we obtain a generic framework in which influence weights in the term-document matrix, dimensionality-reducing projections, and the display of a target subspace may be varied according to nature of the text corpus.
|