Joinutility seperatorLogin utility separator Infobright.com

Academic Blog

31
Mar

ICDE 2009 Update #2

Dominik Slezak's photo
by Dominik Slezak     Tue, Mar 31, 2009

I didn’t sleep well. I kept thinking about examples of using SQL in data mining algorithms. It’s getting one of my hobbies. If it’s your hobby too, please visit this thread. I know it’s nothing fancy, not even close to real-life data problems. However, if appropriately adjusted to practical applications, maybe it could lead to something useful? Anyway, half awaken and half dreaming about SQL, I attended the sessions.

The keynote today was a perfect example of the difference between dreams and reality. David Carlson spoke about one of those scientific megaprojects where nothing is easy, where everything needs to be resolved from scratch, starting with funding and finishing with science. With regards to the project’s goals, I especially liked the aspects of building predictive models. Yesterday we learnt about compound, unstructured and not in place data. Today we listened about data that doesn’t even exist yet. (I know that people usually talk about prediction models in a different style, but isn’t it tempting to describe it like this?) By the way, just to digress a bit, I remember a very nice paper about processing forecasting queries. Simple and bright idea… Nevertheless, for Dr Carlson and other International Polar Year participants simple and bright ideas are not enough. They need to adapt them to solve highly complex tasks, over highly complex data.

The next item in today’s schedule was the awards ceremony. The best ICDE 2009 paper is: “Histograms and Wavelets on Probabilistic Data” by Graham Cormode and Minos Garofalakis. It was presented in Uncertainties session. Well, I decided to attend Transactions session which was held in parallel. However, I’ll surely get back to this paper. It’s perhaps too early to talk about it, but there may be some analogies between processing uncertain data and the way we do rough computing on our Infobright’s Knowledge Grid. Please have a look at, e.g., February posts on our academic blog with this respect.

The best student paper is: “Double Index Nested-loop Reactive Join for Result Rate Optimization” by Mihaela Bornea, Vasilis Vassalos, Yannis Kotidis and Antonios Deligiannakis. I attended Query Optimization session where the paper was presented. (Although Data Mining 1 session was held in parallel. What a pity.) The speaker referred to several good papers on non-blocking join algorithms. It’s wonderful that further improvements are still possible. The session included also other interesting presentations about adaptive query processing and optimizations based on sampling and statistics. Again, I should probably refer to our Knowledge Grid. But that would be too much about Infobright in a single post…

Instead, let me go back to the awards ceremony. The most influential paper award goes to Kin-Pong Chan and Wai-Chee Fu, for the paper titled “Efficient time series matching by wavelets”. The original idea of conducting similarity search by transforming highly complex time series data into a space spanned over relatively small amount of wavelet-based dimensions has been extended in many ways. It was a great pleasure to listen to this presentation. Let me sincerely follow the speaker in her last sentence: I wish everyone of you a very good time series!

Best greetings,

Dominik

Infobright     Tags: icde2009

30
Mar

ICDE 2009 Update #1

Dominik Slezak's photo
by Dominik Slezak     Mon, Mar 30, 2009

Let me start with the keynote talk by Stefano Ceri who introduced the concept of search computing, related to multi-domain queries on the Web. Putting it into simple words, imagine that you want to join two tables that are… the outcomes from two Web services. For example, let’s buy an inexpensive (according to e.g. Amazon) but popular (iTunes) CD. Surely, we might do some “manual” search and combine the results ourselves. However, it’s not like database joins work these days. So how to establish an analogous, sound and efficient framework for data being dynamically retrieved from “somewhere”? The project headed by Professor Ceri leads towards this direction.

Then I listened to two XML-related presentations at the beginning of Industrial Session 1. It’s a shame that I didn’t know SQL/XML before. This SQL extension shows how to think about XML as yet another relational data type. Certainly, by putting XML into relational framework we give up its navigational features. On the other hand, we are then able to analyze it consistently with other data. How about functions that annotate rows with, e.g., strings extracted from XML structures? Well, it sounds simple but - as in case of the keynote talk - the point is how to run extended SQL efficiently. With this respect, I refer especially to the first talk in the session.

Finally, I attended (the last part of) the tutorial on graph mining. All the materials are actually available here. A lot of inspiration! For example, cross-associations in Part IV may be quite interesting in the context our Pack-To-Packs. I was also intrigued by the usage of mapreduce/hadoop to scale the graph mining computations. Well, mapreduce is certainly the topic for a separate post. Let me rather conclude this day in a different way. - I realize that all those graphs, XML documents, and Web queries may look quite odd in this forum. However, isn’t it just a matter of time when we start dealing with less structured, more dynamic data in data warehousing applications?

Best greetings,

Dominik

Infobright     Tags: icde2009

Next Page