University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

Graduation Talks


Cardinality-aware and purely relational implementation of an XQuery processor


Sherif Sakr, University Konstanz
Konstanz, Germany

date & place

Wednesday, 23.05.2007, 17:00 h
Room C252


This thesis presents Pathfinder as a purely relational implementation of an XQuery processor. In Pathfinder, firstly the source XML documents are encoded using the XPath accelerator relational encoding scheme. Secondly, using the loop-lifting technique, XQuery expressions are translated into intermediate algebraic plans. Finally, Pathfinder uses a certain group of well defined SQL translation templates to translate the intermediate algebraic plans into their equivalent SQL evaluation scripts.
In addition, this thesis presents an XQuery cardinality estimation framework. In this framework, a tree summary structure is built for the source XML documents named as Statistical Guide. The distribution of the values for the XML nodes storing atomic values is captured using statistical histograms. The XQuery estimation module receives the intermediate Pathfinder algebraic plan, the Statistical Guide, the statistical histograms and uses a defined set of size estimation inference rules for the algebraic operator to estimate the cardinality of the input XQuery expression as well as its sub-expressions. The XQuery cardinality estimation framework introduced a novel concept that of Guide Node annotations for the Pathfinder algebraic operators. The novelty of this approach lies in its ability to improve the flexibility of the framework that allows it to integrate different XML structural summary techniques as well as different statistical histograms techniques.
Finally, this thesis describes the Pathfinder integrated framework between the Pathfinder SQL code generator module and the Pathfinder XQuery cardinality estimation module. In this framework, the SQL code generator receives the cardinality information inferred by the XQuery cardinality estimation module to generate enhanced cardinality aware SQL scripts and to influence the RDBMS query optimizers for a better selection for the SQL execution plans.
The experiments of this thesis demonstrate the efficiency and scalability of our purely relational approach in comparison to the native XML/XQuery functionality supported by conventional RDBMSs and have shown that our purely relational approach deserves to be pursued further.