Joinutility seperatorLogin utility separator Infobright.com
   
 
Calculating with columns not present in the group by
Posted: 27 April 2009 05:22 AM   Ignore ]  
Newbie
Rank
Total Posts:  4
Joined  2009-04-23

Hi,

we are currently evaluating ICE and a trial IEE and none of them can calculate in the select clause. We are doing this in many cases.

Assume the following case:

I have a dimension “How do you like product x?” with ten categories (1=very much, 10=not at all) and a region variable.

This is what we are doing in many cases:

SELECT COUNT(*),region,AVG(product),SUM(IF(product IN (1,2),1,0)),SUM(IF(product IN (9,10),1,0)) FROM table GROUP BY region

After this i know how many people really love my product in which region and how many hate it wink

This is currently not possible in infobright and we are using clauses like this very very often. I have tried to rewrite the query with a left join but still no luck. This is even a simple use case.

Is there any chance that this will be available in the near future? :S

Furthermore we are desperatly missing the STDDEV_*() functions which i think should be relativly easy to implement.

[ Edited: 27 April 2009 05:30 AM by andreasstreichardt]
Profile
 
Posted: 27 April 2009 06:41 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  648
Joined  2008-08-18

Hi !

The current ICE/IEE release can calculate aggregation only on columns, not on expressions. A common workaround here is to use a subselect to do the calculations on expressions and produce columns to be used in aggregations.

This limitation will be removed soon - in the summer I guess.

Profile
 
Posted: 27 April 2009 07:46 AM   Ignore ]   [ # 2 ]  
Member
Avatar
RankRankRank
Total Posts:  107
Joined  2008-08-18

As Janusz commented, we are currently working on resolving the issue with GROUP BY’s with complex expressions. Target release date is mid-June. The STDDEV() function is also on the product road map. We should start working on it after the mid-June release.

Cheers

Signature 
Profile
 
Posted: 27 April 2009 08:22 AM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  428
Joined  2008-08-18

Hi,

One other suggestion is to add additional columns to your data. For example, you are using the expressions

SUM(IF(product IN (1,2),1,0)),SUM(IF(product IN (9,10),1,0)) 

So create 2 new columns for product category, PRODUCT_IS_1_2 and PRODUCT_IS_9_10
which have the values of 0 or 1. Then just sum on these columns.

It is relatively cheap to add new columns in Infobright, because of the compression.

Geoffrey

Signature 
Profile
 
Posted: 05 May 2009 11:28 AM   Ignore ]   [ # 4 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  428
Joined  2008-08-18

By the way, here is one way to rewrite your query so it will work in Infobright:

SELECT count(*), regionavg(product), sum(if_12), sum(if_910from (
  
SELECT regionproduct, IF(product IN (1,2),1,0) as if_12, IF(product IN (9,10),1,0) as if_910
  from t1
x group by region
Signature 
Profile
 
Posted: 05 May 2009 01:10 PM   Ignore ]   [ # 5 ]  
Jr. Member
RankRank
Total Posts:  75
Joined  2008-10-22

Geoffrey is right, the query can not only be rewritten but they will also be amazingly fast.

Signature 

Chris (cvh@LE),
Leipzig, Germany

Profile
 
Posted: 08 May 2009 10:59 AM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  5
Joined  2009-05-08

Not sure it’s related but I experienced a frustrating bug:

create table test (id intvalue intengine myisam;

insert into test values (11);
insert into test values (12);
insert into test values (21);
insert into test values (22);

select idsum(value), id from test group by id


And boom !

IB thinks I didnot specify the id columns in the group by clause.

If you manage to work on the group by you would eventually be interested in this one.

Anyway congrats for this great piece of software !

Thanks

Eriam

Profile
 
Posted: 08 May 2009 11:06 AM   Ignore ]   [ # 7 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  648
Joined  2008-08-18

Hi !

The good news is that in the current development version it works fine. So in any case you will get it corrected. You may fill a bug report on bugs.infobright.com ...

Profile
 
Posted: 08 May 2009 11:10 AM   Ignore ]   [ # 8 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  428
Joined  2008-08-18

Hi,

Your query

select idsum(value), id from test group by id

can be rewritten as

select idsum_valueid from (select idsum(value) as sum_value from test group by idx

Geoffrey

Signature 
Profile
 
Posted: 08 May 2009 02:08 PM   Ignore ]   [ # 9 ]  
Jr. Member
Avatar
RankRank
Total Posts:  96
Joined  2008-08-18

eriam,

Please run these commands and tell me what you get.  I ran them with IB_3.1-1_r3718_3855(iee) and get the same result in both myisam and infobright.
For other people who don’t have IEE, you would have to load the data instead of insert it.  (Also, eriam, in the future, please post things regarding insert in the IEE forums.)

Here are the commands:

warnings;

drop database if exists eriam;
create database eriam;
use 
eriam;

set autocommit=1;


drop table if exists test_mysql;
create table test_mysql (id intvalue intengine=myisam CHARACTER SET ascii COLLATE ascii_bin;

drop table if exists test;
create table test (id intvalue intengine=brighthouse CHARACTER SET ascii COLLATE ascii_bin;


insert into test_mysql values (11);
insert into test_mysql values (12);
insert into test_mysql values (21);
insert into test_mysql values (22);

insert into test values (11);
insert into test values (12);
insert into test values (21);
insert into test values (22);


select idsum(value), id from test_mysql group by id;  
select idsum(value), id from test       group by id
Signature 

Brian Beharry, QA
Infobright

Profile
 
   
 
 
‹‹ Loader - rejected records      IEE sources? ››