Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

04
May

Compression Characteristics In Infobright

Bob Zurek's photo
by Bob Zurek     Tue, May 04, 2010
One of the key features of Infobright is its ability to highly compression data when it is ingested into the engine. We frequently hear from our customers and community members about the depth of compression they are getting when using Infobright over other database solutions they have used in the past. Recently we co-published an article in e-Week magazine entitled "How To Achieve Greener Data Storage and Analysis" that also discusses the importance of compression. So what are we seeing these days? Here are few examples of the compression ratios at several customers:
 
A large media company: 16:1 
One of the top search engine/portals: 150:1 on one table, another 50:1
A large retailer: 30:1
A performance based digital marketing company: 40:1
 
When it comes to compression, there are a wide variety of factors that affect the compression rates achieved. These include such things are data types where we generally find that numeric columns are more easily compressed than alpha-numerics and alpha-numerics containing longer strings. Columns declared as lookups will compress really well and according to our Chief Scientist, Dominik Slezak, "it's especially good for alpha-numeric columns with low number of distinct values). In addition sparse data (with lots of NULLS) will also compress much better. 
 
Factors that are crucial for our high compression ratios include:
- Compression algorithms automatically detect data types and adjust to them.
- Whenever there are some regularities in data, our compression algorithms will attempt to detect and use them.
- Such regularities are more likely to occur locally, for shorter collections of data items. This is why better compression ratios can be obtained when compressing separately the packs of 64K times instead of the whole series of items for a given column. On the other hand, collections of data items compressed together cannot be too short because then the advantage of detecting and using regularities faces away and the overhead of creating too many packs becomes significant too.
 
Our data packs are 64K in size and as the number of items in each pack is important both for compression ration and for the ability to operating efficiently with our knowledge grid and for our ability to access separate data pieces instead of their larger amounts. This is a key advantage to our model. 
 
Whether you use ICE or IEE, we use the same compression algorithms that we hope give you very satisfactory compression results. 
Infobright     Tags:
Please login or register to post a comment.