Joinutility seperatorLogin utility separator Infobright.com
   
 
left join
Posted: 23 July 2010 07:50 AM   Ignore ]  
Newbie
Rank
Total Posts:  7
Joined  2009-05-06

Hi,

I have a question to a query:

table structure for the dimension:

dim_min with dim_start(timestamp), dim_end(timestamp), dim_year(int), dim_monthid(int), dim_month(int),...., dim_minid, dim_min

there is the full data for a few years, a record for 2010-07-23 looks like:

2010-07-23 00:00:00, 2010-07-23 00:00:59, 2010, 201007, 7, ... 201007230000, 0

the second table, the logtable, has the columns datanodeid, dim_min and val(decimal 15,5) where the datanodeid is an identifier, dim_min the reference to the dimension table and val the logged value.

my query looks like this:

select d.dim_dayidcoalesce(min(l.val),-1), coalesce(max(l.val),-1)
  
from dim_min as d left join logtable as l
       ON d
.dim_minid l.dim_minid and l.datanodeid 168
       where d
.dim_start >= '2009-04-26 00:00:00' and
             
d.dim_end <= '2009-06-17 23:59:59' group by d.dim_dayid order by d.dim_dayid asc

the result is what I want, but the performance isn’t what I expected (7s on a logtable with ~40.000.000 records and dim_min with ~4.000.000 - on my notebook)

select d.dim_dayidcoalesce(min(l.val),-1), coalesce(max(l.val),-1)
  
from dim_min as d left join logtable as l
       ON d
.dim_minid l.dim_minid
       where l
.datanodeid 168 and d.dim_start >= '2009-04-26 00:00:00' and
             
d.dim_end <= '2009-06-17 23:59:59' group by d.dim_dayid order by d.dim_dayid asc

this code works very performant (0,01sec), but I am missing the records, where there exists something in the dim_min, but not in the logtable…

I just tried a lot of combinations, but I could not find the solution, maybe someone can explain it to me?

Best Regards,
Wilfried

Profile
 
Posted: 23 July 2010 08:07 AM   Ignore ]   [ # 1 ]  
Member
Avatar
RankRankRank
Total Posts:  191
Joined  2008-08-18

Hi,

Both queries should execute in the same way. As I can see the only difference is that one condition is moved from ON to WHERE part. Internally, there is no difference between the two cases. What version of IB are you using? Is it ICE or IEE? It seems a kind of defect. (You are welcome wink to submit a bug to http://bugs.infobright.org)

Can you show logs for both queries (with ControlMessages=2 in brighthouse.ini)? The best if you could isolate the issue and give us script with sample data to recreate it.

Thanks,
Piotr

Profile
 
Posted: 23 July 2010 08:54 AM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  7
Joined  2009-05-06

Hi,

I’m using Infobright Community Edition 3.3.2.

In the Attachments, there are the control message for query1 and query2 and the whole bh.err (I renamed it to bh.txt) file.

In the queries, the tablename of the logtable is “logtable003”...

I will try to isolate the problem…

Regards,
Wilfried

File Attachments 
query1.txt  (File Size: 4KB - Downloads: 222)
query2.txt  (File Size: 4KB - Downloads: 275)
bh.txt  (File Size: 62KB - Downloads: 302)
Profile
 
Posted: 23 July 2010 09:30 AM   Ignore ]   [ # 3 ]  
Member
Avatar
RankRankRank
Total Posts:  191
Joined  2008-08-18

Thanks,

I will look into it. I just realized it is LEFT join so semantics of the two cases is different.

Piotr

Profile
 
Posted: 26 July 2010 05:50 AM   Ignore ]   [ # 4 ]  
Member
Avatar
RankRankRank
Total Posts:  191
Joined  2008-08-18

Hi Wilfried,

I guess all is fine. As I already wrote my first impression was that it is a bug as I thought it is an inner join. There are two things here, in case we move single (right) table condition from ON to WHERE part: (i) different number of records and (i) execution time.

(i) in the first case, we get more records since if the condition does not hold the record is generated with NULLs on the columns related to the right table. If the condition is in WHERE then it eliminates such a record with NULLs.

(ii) we evaluate single table conditions before joins. Thus, in the first case the condition does not limit ‘logtable’ and it is fully used for evaluation of join. This makes the whole query costly! We can see this in logs you sent (lines 43-44). After evaluation of WHERE there are 576 packrows of ‘logtable’ to open to calculate join. Each packrow is 65K records. A lot of records need to be joined! In the second case, only 4 packrows of data need to be investigated.

Thanks,
Piotr

Profile