Joinutility seperatorLogin utility separator Infobright.com
   
 
UNION performance
Posted: 06 May 2010 11:54 AM   Ignore ]  
Newbie
Rank
Total Posts:  2
Joined  2010-05-05

Hello Infobright community,

I’m at a point where I don’t know if I should partition my tables into months, weeks or days. Doing by day would require many unions when querying a large date range, but would reduce the rows on a single table. Is there any performance degradation for doing multiple unions in one query?

Thank you for your input

-Eric

Profile
 
Posted: 06 May 2010 02:03 PM   Ignore ]   [ # 1 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

Hi !

UNION is costly. UNION ALL much less.

What is the expected size of the UNIONed result?
What is the expected gain from having small number of rows in a single table?

Profile
 
Posted: 06 May 2010 02:16 PM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  2
Joined  2010-05-05

Sorry, I meant UNION ALL

The goal is to speed up the queries. Right now we have a system that partitions the data per month. So January 2010 has it’s own table, February 2010 has it’s own table, and so forth. When we query over a 2 month span, we use UNION ALL to combine the 2 tables. The problem is that even the monthly tables get quite large (the April 2010 table is 90 million rows). When we do a query for 1 or 2 days only, we have to look at this 90 million row table. If we were to Split it into day tables, we would only look at a fraction of the rows (2 tables of about 3 million rows).

The question is, if I pick a large date range (from April 1st to April 30th, for example), would the unions cause any slow downs, or would the performance be around the same as querying one big monthly table?

Let me know if my explanation is clear…

Profile
 
Posted: 07 May 2010 02:20 AM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  505
Joined  2008-08-18

Hi,

Ensure that you have a date range condition on each SELECT statement. For example, if your parameters are substituted for “?1” and “?2”

SELECT ... FROM january WHERE date BETWEEN ?AND ?2
UNION ALL
SELECT 
... FROM february WHERE date BETWEEN ?AND ?2
UNION ALL
....
UNION ALL
SELECT 
... FROM december WHERE date BETWEEN ?AND ?

If ?1 lies in Febrary and ?2 lies in March, for example, then the SELECT from April…December will return 0 rows and you don’t need to worry about them.

Geoffrey

Signature 
Profile
 
Posted: 07 May 2010 04:19 AM   Ignore ]   [ # 4 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

Hi!

Have you tried the monthly table in ICE in terms of performance? I assume that the rows in a monthly table are more or less sorted according to date, Then, when querying for 2 days, the query should be very fast. While UNION ALL of 30 days can be quite slow, depends on query type if it must first build the whole unioned result. E.g. SELECT .... FROM ( ... UNION ALL ... ) GROUP BY ... can be really slow.

Another story is that the partitioning scheme can be very useful for dropping old data.

Profile