Joinutility seperatorLogin utility separator Infobright.com
   
 
General Design
Posted: 21 January 2012 01:11 AM   Ignore ]  
Newbie
Rank
Total Posts:  37
Joined  2009-05-08

Hi

Just testing things really but would appreciate any feedback/thoughts.

In a table with say 10 million rows and 800 columns, is it best to try and reduce the number of columns if possible or is it fine to leave with that many.

To give an example, say three of the columns use tinyid data type and could be combined into one smallint column with no real hassles.

Obviously with the existing set-up, these columns can be queried at any stage individually or collectively, however it would also be easy to merge them and query with a longer IN statement.

I guess what Im asking is do you believe it is better to reduce the number of columns where easily done in favour of having longer IN statements in the query? Or is it best to have the data as collapsed as possible?

As an example, which (if any) is the recommended design

(collapsed qry) WHERE ID1=16 AND ID2=33 and ID3=7
(columns merged qry) WHERE ID IN (8,19,26,67,89,90,92,98,99,108,189 etc etc etc)

Appreciate any thoughts you may have

Thanks

Chris

Profile
 
Posted: 21 January 2012 11:05 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  743
Joined  2008-08-18

Hi,

Quering on three columns should be considerably faster than a complicated IN statement, especially if the partial columns are (at least partially) well organized, i.e. rows tend to have values similar to their neighbours. For purely random values the query times in both cases are less pedictible - I would suggest just to try both variants.

Definitely there is no problem with 800 columns or more, and if it is possible that in some queries we’ll use only a part of the ID columns, it’s better to split them. Actually we often suggest to split columns with compound information to obtain better peformance.

Regards,

Signature 
Profile