Joinutility seperatorLogin utility separator Infobright.com
   
 
Surrogate Keys
Posted: 09 December 2008 10:56 AM   Ignore ]  
Jr. Member
RankRank
Total Posts:  85
Joined  2008-08-18

Users are asking whether to use surrogate keys in ICE.  And if so, when should they be used.

Some useful links describe surrogate keys and slowly changing dimensions.  These links provide good context before discussing the benefits of surrogate keys.

http://en.wikipedia.org/wiki/Slowly_changing_dimension

http://en.wikipedia.org/wiki/Surrogate_key

To level set on terminology, the use of surrogate keys is a database design technique; it does not necessarily imply the use of indices.  ICE does not need nor does it support indices.  But there are times when sing surrogate keys would be very useful.
Surrogate keys are good for:

1.  Replacing compound PKs & FKs with a single column surrogate key.  It is more efficient to join on 1 column than multiple columns.

2.  Replacing an inefficient data type (e.g. VARCHAR (128) with a BIGINT).  It is more efficient to join on INTEGER than VARCHAR.

Both techniques #1 and #2, above, may speed up Infobright queries, so we recommend them for these cases. are both , To level set on terminology,

And surrogate keys serve more purposes than facilitating efficient joins.  They are also a necessity for a production data warehouse that maintains Type II SCDs.  That is reason enough to use them.  Keep in mind:  Type II SCD = surrogate keys.  If your end users wish to keep historical data (i.e. Type II SCD), surrogate keys are very useful.

On a related note, someone asked, “Would something like MySQL’s AUTO_INCREMENT be enough or should we consider a wider range of possibilities?”

Assuming a client wants to use a surrogate key with sequentially ascending numbers, there are two main ways of incrementing numbers prior to loading the surrogate key into the target database:

1.  Use procedural logic in the ETL layer
2.  Use AUTO_INCREMENT in a MyISAM staging database

Signature 
Profile