University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

Miloš Krstajić

Doctoral Student in the PhD program since 01.09.2008.

advisors

  1. Prof. Dr. Daniel Keim
  2. Prof. Dr. Oliver Deussen
  3. Prof. Dr. Harald Reiterer

organisational data

Room: D214
Tel.: 3583
E-mail: milos.krstajic "at" uni-konstanz.de
Other Resources: Workgroup Page
picture

project description

A vast amount of news articles is published every day all over the world, covering important global and local news events. The global media agencies and thousands of news portals continuously produce new content, while in the last years the traditional readers have also become content creators and not just passive consumers of the generated information. These large streams of textual data are fast, complex, and contain semi-structured and unstructured information about the real-world events. Although huge progress has been made in collecting and storing these data, it is still very challenging to analyze and understand it. Each time-stamped text data record (news article, blog post, tweet...) is an event arriving in the stream within an event sequence, and a series of related events creates an event episode. Such events, event sequences and event episodes also exist in many other domains - network logs, financial transactions, medical records, etc. The size and the complex nature of these temporal events and their relationships make automatic analysis unfeasible. In my thesis, I have used the visual analytics approach, which combines automated computational methods with interactive visualization techniques to facilitate processing, analysis and understanding of the data. Two research directions have been guiding my work on visual analysis of text streams in my dissertation. First, I have reviewed and classified existing visual analytics approaches for data streams according to different streaming criteria related to data, visualization and user space. To achieve this, I have analyzed which methods are used for data processing, transformation and mining, which for visualization, and how are they combined to allow incremental visual analysis. I proposed a set of guidelines and open issues for future research. I have also examined several well-known information visualization methods and observed which visual variables can change and how. I described how these changes are related to the attribute and structural changes that can occur in the data stream. Second, I have developed two approaches to analyze events in news streams and shown how they fit into the proposed classification. I designed CloudLines, a compact visualization for events in multiple event sequences in limited space that uses kernel density estimation to identify short intervals with a lot of events. I examined the sensitivity of the visualization to the parameters of the estimator and existing statistical methods that are used to calculate ``optimal'' parameters. I developed lens and timeline distortion as interaction techniques for CloudLines, as well as decay and cut-off functions to remove irrelevant events and improve performance. Story Tracker is a visual analytics approach for incremental analysis of development of news stories, which can split and merge over time. It allows the user to steer the text clustering algorithms and refine the results at every stage of the data transformation and visualization processes. Text clustering algorithms extract stories from online news streams in consecutive time windows and identify similar stories from the past. The stories are displayed in a visualization, which (1) sorts the stories by minimizing clutter and overlap from edge crossings, (2) shows their temporal characteristics in different time frames with different levels of detail, and (3) allows incremental updates of the display without recalculating the past data. Stories can be interactively filtered by their duration and connectivity in order to be explored in full detail. Two use cases with real news data about the Arabic Uprising in 2011 demonstrate the capabilities of the system for detailed dynamic text stream exploration. My thesis contributions are: - CloudLines, a visualization method that is used to visualize temporal events in multiple sequences in limited space, coupled with interaction techniques for detailed exploration of events; - Story Tracker, a visual analytics system for analysis of text streams, which combines text clustering algorithms with incremental visualization to create a coherent overview of the development of news stories; - A model for visual analytics for streaming data based on a systematic review of existing approaches and design guidelines and recommendations for future research in this domain. - A set of extensions related to real-time visual analytics for text streams, which show how existing methods can be updated to work in real-time.

publications

The following list of publications covers only those, which are or were published during participation at the Graduiertenkolleg / PhD program.

Book Chapters

2010

Articles in Journals

201320122011
2013
2012
2011

Conference Papers

20132012201120102009
2013
2012
2011
2010
2009

Phd Theses

2014