Joinutility seperatorLogin utility separator Infobright.com

Infobright Blog

21
Sep

My take on aggregate tables in Infobright

David Lutz's photo
by David Lutz     Mon, Sep 21, 2009

Aggregate tables are one of many performance structures that have been devised over the years to facilitate higher levels of performance from traditional, row-oriented databases. Basic and advanced indices, partitioning schemes and OLAP cubes are examples of other performance structures and technologies that have been introduced to the data warehousing and analytical BI market.

The general approach of data warehousing designers is to begin with a simple and basic data model design and only add performance structures as needed. This is a reasonable and wise approach as all of these techniques involve additional design work, maintenance, administration and storage. It also increases the complexity of the environment. And that is just the database - it does not include the complexity added for ETL and BI tools. They also have the unintended side effect of increasing end-to-end data load times by reducing load speeds with additional inline or post-processing or rebuilding of aggregate tables and cubes.

This approach is also recommended for column-oriented database environments. The difference is that for queries that are inherently column-oriented - queries that involve aggregates or ask questions "about" the data - instead of row-oriented - queries that ask "for" the actual data elements, often the entire record, such as SELECT * - one often discovers that the eventual need for additional performance structures like indices or aggregate tables is completely alleviated or greatly delayed. This, of course, depends on the nature of the typical use cases of the database.

For example, at a recent customer evaluation, a BI report was written against an atomic base table of ~280 million records and ran in ~5,000 seconds. In a column-oriented database, Infobright, the same report ran in < 150 seconds. This allowed them to replace their ~12 million record aggregate table (of the ~280 million record table) and still receive similar or better performance. This is not only a reduction in the number of tables in a BI environment, it is also a reduction in complexity, administration, storage, etc.

This is not to say that aggregate tables do not have their place, even in a column-oriented database environment. However, it does reduce the number of cases where it will be needed or delay the need for their implementation. One example where retention of aggregate tables could be advisable is where a column-oriented database replaces a row-oriented database and there are pre-existing BI reports, analyses and queries that are written to query these structures. It's possible that the switching complexity outweighs the benefits of eliminating them altogether.


16
Sep

VLDB 2009 (update)

Dominik Slezak's photo
by Dominik Slezak     Wed, Sep 16, 2009

The pdfs of VLDB 2009 papers are already available in DBLP. Let me immediately draw your attention to the section on Experiences and Lessons, wherein the last paper is about DBLP, by Michael Ley. It’s really great to read about something so popular among my academic colleagues. I hope you’ll take a look at it.

Another paper worth reading is, surely, the 10-year Award Keynote. Put it together with the tutorial on column-oriented database systems, as well as with some recent announcements about hybrid columnar stores. It confirms that we did the right thing to focus on columnar architectures a couple of years ago. It also shows that each columnar solution needs to have something beyond basic mechanisms (such as, e.g., columnar scans) to make it really unique and successful.

Best greetings,

Dominik

 

Infobright     Tags:

Previous Page   Next Page