Joinutility seperatorLogin utility separator Infobright.com

Academic Blog

26
Feb

Rough Data Contest (2)

Dominik Slezak's photo
by Dominik Slezak     Thu, Feb 26, 2009

Following my earlier post, it’s my great pleasure to confirm that we are going to proceed!

The contest will be a part of the previously-mentioned rough set conference (Delhi, India, Dec 16-18, 2009). The conference’s name is Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC). This is the 12th event in the prestigious academic conference series that has been held every year or two for more than 15 years now. This is also the first international rough set event in India ever. We believe that it will gain quite a lot of both academic and industry attention. It will be the perfect forum for investigating the future applications of the Infobright Knowledge Grid-related data.

Let me refer to the rough data contest thread for details. I attach there the contest announcement and the RSFDGrC 2009 conference poster. I would also like to encourage all of you interested in getting a sample of rough data to contact me on forums or via email.

Let me also recall one more time the earlier-announced Infobright contest to be held at the DTA 2009 conference (Jeju, Korea, Dec 10-12). We treat these two initiatives as complementary to each other, with one of them focused on applications of the Infobright’s current technology (DTA 2009), and with another one oriented more towards the future of the Infobright Knowledge Grid (RSFDGrC 2009).

Best greetings,

Dominik

Infobright     Tags:

20
Feb

Rough Data Contest

Dominik Slezak's photo
by Dominik Slezak     Fri, Feb 20, 2009

One may say that Infobright Knowledge Grid is a new kind of data.

Imagine a data table with 1 billion rows. In ICE, it corresponds to 15259 rough rows. Each rough row groups together 65536 rows (the last group is smaller). Physically, each rough row is split into data packs corresponding to particular attributes, assuring both horizontal and vertical data decomposition. Logically, each rough row can be treated as a row in a new rough table, where the attributes’ values correspond to the data pack statistics. Comparing to the original data, the rough table has 65536 times less rows, the same number of attributes, but more compound values. For example, given 10 numeric attributes in the original data, we’ll now have 10 rough attributes, wherein every rough attribute labels every rough row with an interval value (min/max values within the corresponding data pack), which can be further extended by, e.g., a binary histogram (encoding the holes in the min/max ranges).

One may say that Infobright analyzes such rough tables while optimizing and executing queries. (Here we talk about standard, precise queries. Approximate queries are still the future.) One may say that “the third face of data mining” described in one of previous posts is about adding more types of rough attributes to rough tables. Last but not least, one may say that rough tables can be useful not only in database scenarios. How about, e.g., rewriting some data mining or visualization algorithms to work on rough tables instead of original data? How much speed would be gained? What about the quality of data mining results and precision of data visualization? Would it be possible to integrate, eventually, the rough and exact computation levels, like we did in Infobright?

Certainly, these are questions too difficult to be answered by a single person or a single research group. Hence, we started considering one more academic contest this year. I have a feeling that the best venue for a rough data contest would be a rough set event. I’m in touch with the organizers of the international rough set conference in Delhi, December 16-18. I’ll know more details pretty soon. Actually, we’ve already created a nicely formatted rough table for one of our favorite 1 billion rows data sets. I’m sure that discussing together the results of such rough data analysis will be greatly inspiring for everyone!

Best greetings,

Dominik

Infobright     Tags:

Next Page