infobright.org
Joinutility seperatorLogin utility separator Infobright.com

Academic Blog

19
Nov

Roughly on Rough Sets

Dominik Slezak's photo
by Dominik Slezak     Wed, Nov 19, 2008

When digging deeper into the bios of Infobright’s founders, you’ll find out that most of us did rough sets. Hence, it’s not a surprise that rough sets are present in Infobright’s technology. On the other hand, when you read about rough sets on the web and try to match it with ICE/IEE, it’s perhaps not as straightforward as expected. Moreover, whenever I have the pleasure of speaking with rough setters, there are some questions too. So let’s put it all together. Let’s refer to our rough set forum:

Rough sets is an approach introduced more than 25 years ago, used so far mainly in data mining and knowledge discovery. As an example, consider automatic detection of some important events in data (like fraud detection or credit risk assessment). In such an application, the rough set approach will attempt to create a decision model that automatically identifies: (1) the cases that support the given event; (2) the cases that do not support the given event; (3) the cases that remain undecided.

Methodologies based on rough sets can be, however, applicable to many other domains too.

At Infobright, for example, we follow the rough set approach to identify: (1) the data portions that are fully relevant to the given query execution; (2) the data portions that are fully irrelevant to the given query execution; (3) the data portions that remain undecided. The corresponding data identification mechanism is based on the Infobright’s knowledge grid, where compact information about the particular data portions is gathered. The data portions are highly compressed and - for analytical queries - only the undecided data portions are required to be accessed.

The progress in rough sets has occurred in two major areas: (a) the usage of variously interpreted lower (full relevance) and upper (possible relevance) approximations in various types of applications (with ICE/IEE as a data warehouse industry example); (b) the support for feature selection and learning classifiers from data (with ICE/IEE hoping to become an example soon with regards to our knowledge grid tuning).

The rough set community is one of those truly dynamically growing international groups and that we at Infobright are happy to expand it together onto previously uncharted territories of applications. For more information, let me refer everyone to the already-mentioned rough set forum, as well as to the homepage of International Rough Set Society and to the online database with rough set publications. It’s surely not the last post on rough sets here!!!

Best greetings,

Dominik

PS: I’ve just been told that one of my friends, a professor at a university in Poland, develops Rough ICE: Rough Set Interactive Classification Engine. Isn’t it a nice coincidence?

PS2: I attached a short paper about Rough Sets and Data Warehousing to the post #1 on our academic papers forum. As usual, I invite everyone to read and comment it!

Infobright     Tags:

13
Nov

VLDB 2008 & our first publication

Dominik Slezak's photo
by Dominik Slezak     Thu, Nov 13, 2008

It’s not that I haven’t published at any conferences before. I have!!! However, those were conferences on intelligent systems, reasoning under uncertainty, rough sets and soft computing, et cetera. Before 2008, I have never submitted a paper to a database event.

I have studied database literature. But listening to live talks is surely far better. This year I went to SIGMOD and VLDB. I was amazed. I was inspired. I’ll do everything to participate in the major database conferences from now on. I’ll try to go to ICDE too. If anyone has comments on other database events, I’ll be happy to learn something!

Recently, the VLDB 2008 materials became available online (both papers and presentations). In particular, I invite everyone to have a look at Infobright’s first conference publication:

Brighthouse: an analytic data warehouse for ad-hoc queries (Brighthouse is the former name of the Infobright data warehouse software)

There is quite a story behind this one. We wanted to submit it to SIGMOD but couldn’t keep the dates. Then we tried with PODS but it was rejected. It was really unwise to send this kind of paper there but, on the other hand, we received a lot of valuable comments from the reviewers (thanks a lot!). Those comments helped us to put together the final version accepted to VLDB.

But it’s not the end of the story yet…

I had to change the date of my presentation. (I had to run away to another conference.) The VLDB 2008 organizers were so kind to move me to the Data Mining Session on the very first day. I like data mining (with all its faces). I enjoyed all the talks in the session. On the other hand, I heard later that some people did not notice changes in the program. I’d like to apologize for this. I remember that all the talks were recorded. I hope those recordings are still available. Actually, I did such a bad job with answering to the questions that I’d really like to listen to myself one more time!

Anyway, although the paper was written prior to the ICE Era, the core ideas based on our knowledge grid, columnar storage and data compression remain the same. Actually, the paper focuses mostly on optimization / execution of the select statements, which is the part of code (and invention!) identical in the Infobright Community and Enterprise Editions. Therefore, I hope one may find it as interesting and useful. Certainly, I’ll be glad to answer any questions related to the paper!

Have a good reading!

Dominik

By the way, I created a new thread in the Infobright’s forums related to the current and future conference publications (http://www.infobright.org/Forums/viewthread/297/).

Infobright     Tags:

Next Page