Joinutility seperatorLogin utility separator Infobright.com
   
 
Rough Data Contest & RSFDGrC 2009 Conference
Posted: 25 February 2009 10:35 AM   Ignore ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

Hello,

I open this thread in relation to the recent blogs about the Knowledge Grid-related data (aka Rough Data):

http://www.infobright.org/index.php/Blog/Entry/rough_data_contest/

http://www.infobright.org/index.php/Blog/Entry/rsfdgrc_2009_conference_rough_data_contest_2/

We’re going to proceed with organizing the Rough Data Contest at the following international conference:

http://web.iitd.ac.in/~rsfdgrc09/

I attach the official contest announcement and the conference poster.

Best greetings,

Dominik

[ Edited: 10 June 2009 12:36 AM by Dominik Slezak]
File Attachments 
Rough Data Contest.pdf  (File Size: 28KB - Downloads: 400)
RSFDGrC'09 CfP.pdf  (File Size: 234KB - Downloads: 488)
Signature 
Profile
 
Posted: 01 April 2009 09:04 PM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

Here is a sample of above-mentioned data with interval values. Every attribute is described by two columns (min and max). The objective of the contest is very simple (although quite roughly defined): “Find meaningful information and represent it nicely”. All kinds of methodologies are welcome. For example, one can try to adapt association rules or decision trees to deal with interval values. Another example would be to extend some visualization techniques originally defined for exact data. Please feel free to propose your own ideas, ask questions and participate in discussion.

Best greetings,

Dominik

File Attachments 
data for rough data contest.zip  (File Size: 1305KB - Downloads: 325)
Signature 
Profile
 
Posted: 01 May 2009 10:38 AM   Ignore ]   [ # 2 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

I attach a better version of rough data. It is computed from the same original dataset. However, prior to calculation of min/max values, we changed the ordering of rows using the algorithm for better data organization on load.

More comments can be found on the blog:

http://www.infobright.org/Blog/Entry/organizing_data_and_more_about_rough_data_contest/

Please use the attachment as the final version of the data for the contest.

Best greetings,

Dominik

[ Edited: 03 May 2009 03:03 AM by Dominik Slezak]
File Attachments 
new data for rough data contest.txt  (File Size: 5825KB - Downloads: 346)
Signature 
Profile
 
Posted: 10 June 2009 12:51 AM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

A short update on the December academic events in India:

As mentioned at the blog (http://www.infobright.org/Blog/Entry/december_conferences_in_india_-_experience_workshop/), there are new paper submission deadlines for the rough data contest, the conferences (there are two of them organized together but with different deadlines) and the experience workshop - a new initiative that is to support the bridge between academics and industry.

For everyone’s convenience, I attach below four pdfs with the most updated information about all these activities. (Well, some of them were attached to my previous posts but I guess it’s good to have everything in the same place. Also, let me clarify one more time that the final data set recommended for the rough data contest is the one posted on May 01.)

Best greetings,

Dominik

File Attachments 
PReMI Conference.pdf  (File Size: 235KB - Downloads: 1044)
RSFDGrC Conference.pdf  (File Size: 234KB - Downloads: 442)
Experience Workshop.pdf  (File Size: 54KB - Downloads: 318)
Rough Data Contest.pdf  (File Size: 28KB - Downloads: 290)
Signature 
Profile
 
Posted: 01 July 2009 05:43 AM   Ignore ]   [ # 4 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

Hello,

Somehow I didn’t notice the comment at:

http://www.infobright.org/Blog/Entry/organizing_data_and_more_about_rough_data_contest/

My fault!

I replied today. I also paste the reply below.

Best greetings,

Dominik

==============

Hello,

Sorry for replying with delay!

I’ll double check the case of minimal values missing.

Regarding the attributes that don’t make any sense, please note that for alphanumeric data types we compute min and max basing on lexicographic ordering.

For example, the original column DLR_TRANS_TYPE has only two values: PURCHASE and SALE. So, whenever both of them occur in a given data pack, the min and max values will be equal to PURCHASE and SALE, respectively. However, if a data pack contains only one of those values, the min and max values will be the same.

As another example, consider column TRANS_MONTH. In this case, the values in data packs are often fully homogeneous, which results in the same min and max values. Unfortunately, in the cases of data packs with more than one distinct value, we compute min and max using lexicographic ordering over the names of months, which may be quite counterintuitive.

We can still use efficiently such min/max values inside the Infobright engine, for internal optimizations. However, I agree that they make no sense externally, also in the case of this particular task of rough data analysis.

I need to think about it…

Best greetings,

Dominik

==============

Signature 
Profile
 
Posted: 14 July 2009 05:22 AM   Ignore ]   [ # 5 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

I attach an updated version of the next rough set conference call for papers.

You have still one month to prepare the papers!

Best greetings,

Dominik

File Attachments 
RSFDGrC Conference.pdf  (File Size: 234KB - Downloads: 393)
Signature 
Profile
 
Posted: 21 July 2009 06:41 AM   Ignore ]   [ # 6 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  453
Joined  2008-08-18

Hello All,

Regarding my previous post (pasted below), we fixed the problems with the rough data.

Please see the attachment.

1. There are no missing values.

2. We added a new column TRANS_MONTH_NUM to the original data. It corresponds to new rough columns MIN_TRANS_MONTH_NUM and MAX_TRANS_MONTH_NUM. One can see that it makes more sense than MIN_TRANS_MONTH and MAX_TRANS_MONTH. For example, there was a rough row with MIN_TRANS_MONTH equal to APRIL and MAX_TRANS_MONTH equal to MARCH, which clearly makes no sense in practice. In this case, MIN_TRANS_MONTH_NUM equals to 3 and MAX_TRANS_MONTH_NUM equals to 4. So, please use MIN_TRANS_MONTH_NUM and MAX_TRANS_MONTH_NUM instead of MIN_TRANS_MONTH and MAX_TRANS_MONTH.

Best greetings,

Dominik

Dominik Slezak - 01 July 2009 05:43 AM

Hello,

Somehow I didn’t notice the comment at:

http://www.infobright.org/Blog/Entry/organizing_data_and_more_about_rough_data_contest/

My fault!

I replied today. I also paste the reply below.

Best greetings,

Dominik

(...)

[ Edited: 21 July 2009 06:59 AM by Dominik Slezak]
File Attachments 
even_newer_data_for_rough_data_contest.txt  (File Size: 7449KB - Downloads: 337)
Signature 
Profile