Title: Mining from Data Streams: Issues and Challenges
The Machine Learning community is faced to new challenges with the advent of sources producing continuously flow of data. Examples of streaming data include sensor networks, customer click streams, telephone records, web logs, multimedia data, sets of retail chain transactions, etc. These data sources are characterized by high-speed flow of huge amounts of data generated from non stationary distributions. In consequence, new learning techniques are needed to process streaming data in reasonable time and space. The goal of this tutorial is to present and discuss the research problems, issues and challenges in learning from data streams. We will present the state-of-the-art techniques in change detection, clustering, classification, frequent patterns, and time series analysis from data streams. We will discuss the current trends, challenges and open issues and future directions in learning from data streams,
Specific goals and objectives
Joao Gama is a researcher at LIAAD-INESC Porto LA, the Laboratory of Artificial Intelligence and Decision Support of the University of Porto.
His main research interest is Learning from Data Streams. He has published several articles in change detection, learning decision trees from data streams, hierarchical clustering from streams, etc. Editor of special issues on Data Streams in Intelligent Data Analysis, J. Universal Computer Science, and New Generation Computing.
Title: Constraint-Based Data Mining and Inductive Queries
In its most general formulation, the task of data mining is to find patterns in data: As such it is vastly underspecified. To make the task more precise, we first have to specify the type of patterns considered (where the word pattern is taken in a broader sense to include frequent patterns, predictive models or other regularities in the data, e.g., clusters). We then have to specify what conditions the patterns have to satisfy in order to consider them as solutions to the data mining task at hand. In constraint-based data mining, the conditions that a pattern has to satisfy are called constraints, stated explicitly and under direct control of the user/data miner.
Constraints play an important role in the area of inductive databases and inductive queries, where a database perspective on knowledge discovery is taken in which knowledge discovery processes become query sessions. Inductive queries can be used to mine patterns from data, as well as apply patterns to data and KDD becomes an extended querying process. Inductive queries consist of constraints which the patterns of interest have to satisfy and are hence closely related to constraint-based data mining.
The tutorial will introduce the research areas of inductive databases/queries and constraint-based data mining. It will give an overview of the different types of constraints commonly considered as well as selected constraint-based data mining algorithms for different data mining tasks. In particular, constraint-based mining of frequent patterns, predictive models and clustering will be considered. We will also discuss current and future research directions and challenges in these areas.
Specific goals and objectives
Saso Dzeroski is a scientific councillor at the Jozef Stefan Institute, Deptartment of Knowledge Technologies, and an associate professor at the Jozef Stefan International Postgraduate School, both in Ljubljana, Slovenia.
His research interests are in the areas of Data Mining, Machine Learning, and Knowledge Discovery in Databases, and their applications. More specifically, on the methodology side they focus on Computational Scientific Discovery / Equation Discovery and Constraint-Based Data Mining / Inductive Queries. On the application side, his focus is on applications in Environmental Sciences (ecological modelling) and Life Sciences (bioinformatics and systems biology).
Besides research and publications related to the topics of this tutorial, he is the coordinator of the EU funded project IQ (Inductive Queries for Mining Patterns and Models). He has co-organized two workshops on this topic, namely two editions of the international workshop Knowledge Discovery in Inductive Databases held at ECML/PKDD (KDID-03 and KDID-06). With Jan Struyf, he has edited a book on this topic, based on the KDID-06 workshop, published by Springer.
Last modified by webmaster.