University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

Hansi Senaratne

Doctoral Student in the PhD program since 01.07.2012.

advisors

Jun.-Prof. Dr. Tobias Schreck

organisational data

Room:  D334
Tel.:  (+49) 07531 88-2507
E-mail:  Hansi.Senaratne(at)uni-konstanz.de
Other Resources: Workgroup Profile
picture

project description

With the increased availability of spatially referenced data, assessing the quality and credibility of such data becomes vital. With a special focus on volunteered geographic information (VGI) such as geotagged Flicker or Twitter data sources, this thesis deals on the one hand with means to quantify various quality components such as the positional, temporal and thematic uncertainties of these user generated data, and on the other hand explore mechanisms to evaluate the credibility of the data contributor. Humans perceive and express geographic regions and spatial relations imprecisely, and in terms of vague concepts. This vagueness in human conceptualization of location is due not only to the fact that geographic entities are continuous in nature, but also due to the quality and limitations of spatial knowledge. Therefore, credibility can be expressed as the believability of a source or message, which comprises primarily of two dimensions, the trustworthiness and expertise. Thus, in assessing the credibility of users this thesis considers factors that attribute to this perception of trustworthiness and believability, other than data accuracy itself. Metadata about the origin of volunteered geographic information provides a foundation for judgment on the credibility of the source. To assess the credibility of visual VGI, we proposed to assess the location correctness of visually generated VGI as a quality reference measure. The location correctness is determined by checking the visibility of the point of interest from the position where the visually generated VGI originates (observer point); as an example we utilized geotagged Flickr photographs. Therefore, we first implemented a Flickr metadata crawler that relies on the open Flickr API to fetch metadata of Flickr photographs for a specified set of tags. Using a quadtree algorithm to facilitate access to all photographs relating to a particular tag query, we were able to download metadata of photographs textually tagged with a particular point of interest. In a second step, to derive the reference quality measure for location correctness, a reverse viewshed for the point of interest was calculated (Figure 1). This determines the line of sight (LOS) for our point of interest from the observer points primarily based on surface elevation data, such as of buildings and vegetation. To determine the visibility of the target pixel, the intermediate pixels were analysed for their LOS. If the LOS is visible, then the target pixel is included in the viewshed. If obscured then the target pixel is not included in the viewshed ¹. If the point of interest lies outside the visibility from a given observer point, the respective geotagged image is considered to be incorrectly geotagged. This way, we analyzed sample datasets of photographs and made observations regarding the dependency of certain user/photo metadata and (in)correct geotags and labels (Figure 2). This dependency relationship between the location correctness and user/photo metadata is then used to automatically infer user credibility. E.g., attributes such as profile completeness together with the location correctness serves as a weighted score to assess user credibility. These findings are now published in the Transactions in GIS journal.
Fig 1. The 9 Characteristics of a viewshed: surface elevation (Spot), vertical distances (Offset A, and B), horizontal angles (Azimuth 1 and 2), top and bottom vertical angles (Vert 1 and 2), inner and outer radius (Radius 1 and 2). The observation point is at OF1 and the target point is at OF2. These characteristics can be controlled and employed to realize the line of site
Focusing on other factors relating to spatio-temporal data quality, we further looked into the thematic and positional uncertainty of such data. Uncertainty, known as the state of not knowing is caused by measurement errors due to data processing and transformations etc. The increasing complexities of data and their inherent uncertainties require effective means to explore these data, considering not only the data content but also associated certainty (or quality) properties. Many different visualization methods ranging from static, dynamic to interactive in nature have to date been developed to communicate these data and their underlying uncertainties, and many of these methods have been assessed on their usability in various settings. In our research we selected five representative uncertainty visualization methods to reflect a broad spectrum of geospatial uncertainties, and assessed their usability in terms of user's ability to interpret and perform data analysis with these methods (learnability) and user preference (likability). The users who participate in this study were invited from different user domains such as statistics and climate change research. By investigating the interdependency between learnability and likability usability criteria of the visualization methods, we have computed a contingency analysis. Thereby we investigated the variability of usability within each user domain. The found results indicate that color gradient visualization methods with side by side comparison technique and color coded symbols visualization methods have higher contingency scores for all selected user domains. The found results further imply that intrinsic visualization method Contouring is difficult to interpret with its two different visual variables to represent data (color transparency) and its underlying uncertainty (contour thickness). Furthermore, Statistical dimension in a GIS and Errorbars and intervals interactive visualization methods with the least contingency scores called for more intuitive design. My future research directions will focus more on various techniques to assess the quality (such as credibility) of such user generated data, and means to visually analyze them. I'm currently co-advising two master theses which deal with the aforementioned. Using Twitter as a use case, we will look at how the various user, message, topic and propagation features (eg., no. of followers) can be combined to derive accurate and user defined events within a visual analytics framework. Furthermore, using once again these microgblogs we will explore methods to detect the spatio-temporal and topic drifts for given Twitter events. This will enable us to track their chronological evolution in terms of sentiments and credibility. These works are under way for future submissions.
Fig 2. a, b, c, d show areas of visibility (green) from four different observer points to the Reichstag in Berlin (highlighted with red rectangle). The arrow points to the observer point and the image taken from there
¹Fisher, P. F. (1996). Extending the applicability of viewsheds in landscape planning. Photogrammetric Engineering and Remote Sensing, 62(11), 1297-1302.

publications

The following list of publications covers only those, which are or were published during participation at the Graduiertenkolleg / PhD program.

id should be a number