Towards Generic Methods for Document Understanding of Arbitrary, Digital-Born Textual Documents


Dr. Tamir Hassan, Vienna University of Technology, Austria

Friday, 09.11.2012, 10:00 h
Room C 202


This talk will introduce the research areas of document analysis and document understanding, which aim at rediscovering the logical structure (e.g. headings, tables, lists, headers/footers, image captions, etc.) of a document. This logical structure is of great importance for applications such as search, repurposing, data visualization and indexing/cataloguing. Whereas humans are able to recognize the logical structure of an arbitrary document very accurately, most existing approaches have up to now been restricted input documents of a particular class. After a summary of these approaches, we will discuss the research steps necessary to develop methods that are able to accept any predominantly textual document as input and recognize structures common to a wide range of documents with a high level of accuracy.