Hello,
Somehow I didn’t notice the comment at:
http://www.infobright.org/Blog/Entry/organizing_data_and_more_about_rough_data_contest/
My fault!
I replied today. I also paste the reply below.
Best greetings,
Dominik
==============
Hello,
Sorry for replying with delay!
I’ll double check the case of minimal values missing.
Regarding the attributes that don’t make any sense, please note that for alphanumeric data types we compute min and max basing on lexicographic ordering.
For example, the original column DLR_TRANS_TYPE has only two values: PURCHASE and SALE. So, whenever both of them occur in a given data pack, the min and max values will be equal to PURCHASE and SALE, respectively. However, if a data pack contains only one of those values, the min and max values will be the same.
As another example, consider column TRANS_MONTH. In this case, the values in data packs are often fully homogeneous, which results in the same min and max values. Unfortunately, in the cases of data packs with more than one distinct value, we compute min and max using lexicographic ordering over the names of months, which may be quite counterintuitive.
We can still use efficiently such min/max values inside the Infobright engine, for internal optimizations. However, I agree that they make no sense externally, also in the case of this particular task of rough data analysis.
I need to think about it…
Best greetings,
Dominik
==============