Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

21
Sep

My take on aggregate tables in Infobright

David Lutz's photo
by David Lutz     Mon, Sep 21, 2009

Aggregate tables are one of many performance structures that have been devised over the years to facilitate higher levels of performance from traditional, row-oriented databases. Basic and advanced indices, partitioning schemes and OLAP cubes are examples of other performance structures and technologies that have been introduced to the data warehousing and analytical BI market.

The general approach of data warehousing designers is to begin with a simple and basic data model design and only add performance structures as needed. This is a reasonable and wise approach as all of these techniques involve additional design work, maintenance, administration and storage. It also increases the complexity of the environment. And that is just the database - it does not include the complexity added for ETL and BI tools. They also have the unintended side effect of increasing end-to-end data load times by reducing load speeds with additional inline or post-processing or rebuilding of aggregate tables and cubes.

This approach is also recommended for column-oriented database environments. The difference is that for queries that are inherently column-oriented - queries that involve aggregates or ask questions "about" the data - instead of row-oriented - queries that ask "for" the actual data elements, often the entire record, such as SELECT * - one often discovers that the eventual need for additional performance structures like indices or aggregate tables is completely alleviated or greatly delayed. This, of course, depends on the nature of the typical use cases of the database.

For example, at a recent customer evaluation, a BI report was written against an atomic base table of ~280 million records and ran in ~5,000 seconds. In a column-oriented database, Infobright, the same report ran in < 150 seconds. This allowed them to replace their ~12 million record aggregate table (of the ~280 million record table) and still receive similar or better performance. This is not only a reduction in the number of tables in a BI environment, it is also a reduction in complexity, administration, storage, etc.

This is not to say that aggregate tables do not have their place, even in a column-oriented database environment. However, it does reduce the number of cases where it will be needed or delay the need for their implementation. One example where retention of aggregate tables could be advisable is where a column-oriented database replaces a row-oriented database and there are pre-existing BI reports, analyses and queries that are written to query these structures. It's possible that the switching complexity outweighs the benefits of eliminating them altogether.

I agree. and in these complicated times the old KISS principle is still valid. keep it simple if you can. in many cases users can not follow our complexity anyway. keep it simple, stupid! (there might be other translations for KISS). get the data, do a little bit of work, dump it into infobright and that’s it.
in cases where users previously used access, excel or other similar tools, the speed of infobright will be a leapyear forward and the users will be delighted. of course there are cases where it is not so easy and I am a strong fighter for proper designs, but again in many cases I believe being quick and fast is a good starting point.

uwe

Author: uwegeercken
Date: 10/13/09

Please login or register to post a comment.