infobright.org
Joinutility seperatorLogin utility separator Infobright.com

Data Warehousing

04
Feb

Much Ado About NULLS

Victoria Eastwood's photo
by Victoria Eastwood     Wed, Feb 04, 2009

One of the cool things about column orient data stores such as ICE is that they can handle some “unusual” data sets which cause other data stores to be painfully slow. I use the word “unusual” but perhaps I should say uncommon. On the other hand, I suspect if you had one of the types of data sets, you would have been forced to rejig your data model or your data to make it work at all. However, with ICE you might be able to reconsider. Here’s an example.

If you have tables that are extremely wide with lots and lots of columns, you are going to experience more pain with a row oriented database than you would with a column oriented database. But if you have the double whammy, of lots of columns with lots of NULLS you have even more problems.

If you are in this boat, I highly recommend that you try ICE. Here’s why. ICE is column oriented, so queries only access the columns they need. So that’s going to speed things up a bit.

Secondly, you don’t need indices with ICE (cause we have the Knowledge Grid). That not only saves you work effort of defining them but also has huge implications to load and insert times (significantly less) not to mention disk space.

You may not know this, but ICE has special handling for NULLS. NULLS are stripped from the data before compression. This technique improves compression but it also has the side benefit of making the Knowledge Grid even more accurate (particularly when there are lots of NULLS and only a few data values). The better the Knowledge Grid can represent the data, the faster the queries.

The end result, faster load and faster query for fat tables with lots of NULLS.

I know of one client that has this type of data. Seems the fact table carries lots of demographic type indicators for a marketing application. Would be interested to find out if there are other applications that have this type of data set.