Joinutility seperatorLogin utility separator Infobright.com
   
2 of 2
2
Brighthouse out of resources error: Out of resources on JOIN.
Posted: 29 May 2009 10:35 AM   Ignore ]   [ # 16 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  764
Joined  2008-08-18
bob_the_web - 29 May 2009 10:27 AM

I guess a second reason NOT to pre-calculate is if the country code mapping changes - as it could easily do.

But it is a good reason to precalculate it!
Suppose you are a fact table, and a fact regarding some click (or anything else). You’re determining a country code for this fact, for the moment it occured.
But if in next two months country code mapping will change, it should not change the past. A country code for any existing facts should not change, only the future ones should be affected. And JOIN solution will work opposite…

Unfortunately I have no more ideas for optimizations. Except waiting some time for optimized version of ICE.

Regards,

Signature 
Profile
 
Posted: 11 June 2009 03:50 AM   Ignore ]   [ # 17 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello bob_the_web,

We had quite a long discussion about your queries at Infobright recently.

Could you please let us know whether the solution suggested by Jakub helped?

Thanks and best greetings,

Dominik

Signature 
Profile
 
Posted: 23 June 2009 05:53 AM   Ignore ]   [ # 18 ]  
Newbie
Rank
Total Posts:  4
Joined  2008-11-13

I’ve had success “exploding” ranged dimensions similar to the country_code table here into repeated rows for each value. This enables equality join operations in the queries (eg “table_1 a inner join country_code c on a.ip1 = c.ip), which are significantly more robust and perform WAY better. Obviously, that procedure, if implemented without any regard to the data in question here (ip addresses and traffic data by the looks of the tables) would result in a rather large (4 billion row) country_code dimension. However, by taking into account some aggregate traffic pattern information and knowledge of the unused ranges of IP addresses, you can drop the number of rows significantly. Worth a shot, I’d say.

Profile
 
Posted: 17 July 2009 08:29 AM   Ignore ]   [ # 19 ]  
Newbie
Rank
Total Posts:  49
Joined  2009-05-19

Hi,
Sorry for the delay - been off on other things for a while

I solved this by adding the CC to the raw data before loading in to ICE. I have to parse raw log files to produce the data set so I simply did the lookup (via a simple dictionary) at that time. eg; precalculated

Not as elegant but it works just fine.
Thanks

Dominik Slezak - 11 June 2009 03:50 AM

Hello bob_the_web,

We had quite a long discussion about your queries at Infobright recently.

Could you please let us know whether the solution suggested by Jakub helped?

Thanks and best greetings,

Dominik

Profile
 
Posted: 17 July 2009 08:44 AM   Ignore ]   [ # 20 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello,

Thanks for the update! After all, it looks like the simplest solution. I’m glad it works fine now.

Best greetings,

Dominik

Signature 
Profile
 
   
2 of 2
2