University of Konstanz
Graduiertenkolleg / PhD Program
Computer and Information Science

Colloquium of the Department and the PhD Program


A new algorithm for database join and aggregation operations


Dr. Goetz Graefe, HP Labs
Palo Alto, USA

date & place

Wednesday, 20.04.2011, 17:15 h
Room C 252


Database query processing traditionally relies on three types of algorithms for join and grouping operations. For joins, index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. For grouping, an index-based algorithm has been used in the past whereas today sort- and hash-based algorithms prevail. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately, mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve.

Our goal is to end mistaken choices among join algorithms and among grouping algorithms by replacing the three traditional types of algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system.

The in-memory components of the new join algorithm employ indexes. If the database contains indexes for one (or both) of the inputs, the new join can exploit persistent indexes instead of temporary in-memory indexes. Using database indexes to match input records, the new join algorithm can also replace index nested loops join.


Goetz Graefe is an HP Fellow working in the Intelligent Information Management Lab within Hewlett-Packard Laboratories. Prior to joining HP in 2006, Goetz spent 12 years as software architect in product development at Microsoft, mostly in database management. Both query optimization and query execution of Microsoft's re-implementation of SQL Server are based on his designs. Goetz's research credentials include numerous publications as well as surveys published by ACM Computing Surveys. His original publications cover query optimization, query execution, and indexing. His work has been honored by the ACM SIGMOD 2000 "test of time" award for work on parallel query execution, by the IEEE ICDE 2005 "influential paper" award for work on extensible query execution, and by the 2009 ACM "software systems" award for participation in the Gamma database machine research project.

Goetz studied Computer Science at T U Braunschweig from 1980 to 1983. In 1983, he was admitted to the University of Wisconsin - Madison, where he was granted a M.S. degree in 1984 and a Ph.D. in 1987.