Wednesday:
Keynote #3: Data Management in the Cloud by Raghu Ramakrishnan. The speaker could not physically attend ICDE 2009. However, he connected and spoke from USA. It was a kind of lecture from the cloud. And it wasn’t bad at all! Actually, we’re organizing together with some of my academic colleagues a couple of small rough sets / soft computing events in India this year. I’m not able to attend all of them. Thus, I’ll do my best to speak from the cloud as well.
Data in the Cloud is a very hot topic. We’re doing something with it at Infobright too. (Let us refer, e.g., to our community blog.) However, compared to Yahoo! it’s just kindergarten. In his talk, Dr. Ramakrishnan introduced - step by step - the main features, applications, and components of the cloud solutions. It was extremely useful to listen about data serving vs. data analysis, clouds being functional (check out Search Monkey and BOSS!) and horizontal, details related to data consistency et cetera – all in one lecture!
After keynote, I attended session Query Processing 1. I particularly liked presentation of the paper titled “Space-Constrained Gram-Based Indexing for Efficient Approximate String Search”. Grams are sub-sequences of fixed length. For example, 2-grams will represent pairs of consecutive characters. The idea is to use grams both to express similarities between strings and to build inverted indexes strings in the database. Given that such indexes would be too large, the authors compress them. Precisely, the authors decrease the index size by (roughly speaking) removing or combining some of the grams. On the one hand, one can regard it as a lossy index compression. On the other hand, the authors show that it does not lead to a decrease in querying exactness and even increases the speed.
I like this paper also because the way of reasoning is somewhat analogous to what we’re doing in ICE. Recently, we even considered adding some simple gram-based structures to our Knowledge Grid. However, there are no satisfactory results yet. (For the existing structures we refer, as usual, to VLDB 2008.) Certainly, in our case it’s a different scenario. We “index” at the level of larger packs of values. Also, we’re not (yet!) into similarity searches. Nevertheless, I’ll introduce this paper in detail to my colleagues at Infobright.
Finally, I attended the last part of the tutorial on Preference Queries from OLAP and Data Mining Perspective. Again, a lot of valuable material! (The authors intend to make the materials available online soon.) I was especially interested in problems related to searching for minimal Satisfying Preference Sets. The idea is to ask the users to identify some most preferred (superior) and least preferred (inferior) examples from available data and to heuristically find possibly smallest subset of available preference criteria that would justify the users’ choice. It partially resembles the preference reduction methodology developed within the framework of rough sets. I referred to this approach in rough set thread (#9). However, the case of such partially defined target preference attribute was not considered there.
Best greetings,
Dominik