Big data, data mining, machine learning, predictive analytics, etc.: All of these have become buzzwords in today’s information driven world, that is overwhelmed by a tsunami of data. For sure, in many applications, the mere size of the database is a real challenge, with of course Google as a most prominent example.
But not only size matters. In many applications, the data are ‘technical’ and ‘complicated’, and the objectives of the mining exercise are also technical and/or economical, with a clear return-on-investment. In this presentation, we will talk about ‘serious data’, by which we basically mean those data for which an in depth know-how and understanding of the field and application is mandatory. Examples are biomedical and health data (think of genomics, decision support tools), industrial data (process industry monitoring and control), environmental (micro-climate simulations), financial (fraud detection, bank customer modeling), smart city applications (energy grid monitoring), etc.
We will also talk about‘serious mining’, by which we mean that we use a full toolbox of advanced machine learning algorithms, including system identification methodologies for dynamical systems and time series, clustering, classification, ranking algorithms, etc.
In our lecture, we will first give a broad overview of the general trends that explain the tsunami of data in technical applications. Then we will briefly elaborate on the necessary ingredients for data mining (compute infrastructure, storage, analytics, visualization, security). Of utmost importance before the mining exercise can even start, is a clear enunciation of the objectives. We will show examples in ICT, Finance, Education, Smart Cities, Health and then enumerate the mining tasks that one can formulate.
We will briefly dwell into the typical work package partitioning of a data-mining project, elaborate on advanced algorithms we use, and finish with use cases from load forecasting on the national electricity grid in Belgium, industrial process monitoring, social network clustering, financial fraud detection and finally several health and genomics related projects.
Bart De Moor obtained his Master Degree in Electrical Engineering in 1983 and a PhD in Engineering in 1988 at the KU Leuven. For 2 years, he was a Visiting Research Associate at Stanford University (1988-1990) at the departments of EE (ISL, Prof. Kailath) and CS (Prof. Golub). Currently, he is a full professor at the Department of Electrical Engineering in the research group STADIUS and the Scientific Director of the iMinds Future Health Department. His research interests are in numerical linear algebra, algebraic geometry and optimization, system theory and system identification, quantum information theory, control theory, datamining, information retrieval and bio-informatics (see publications on http://www.bartdemoor.be).