The vendors who build their database technology on the columnar stores
usually claim that denormalization in such stores leads to better
outcome than in case of the row stores. Our ICE is not an exception
here. This is true especially for the decision support query workloads.
Firstly, adding more columns to a big, denormalized table is less
harmful for the query performance in the columnar stores than in the row
stores, especially for the queries which use only a fraction of all the
columns. Secondly, the columns created during denormalization are
usually expected to have highly repeatable values and, therefore, they
are expected to be easily compressible is the columnar stores.
All this does not mean that denormalization should be always the way to
go in ICE and in other columnar stores. As mentioned in the previous
post, it depends a lot on the business users’ requirements, the expected
ETL overheads, etc. Moreover, it is not always the case that the
above-mentioned benefits are the benefits indeed. Firstly, it is not
always the case that the columns resulting from denormalization can be
really compressed better enough than the original ones, kept in the
nicely normalized model. Secondly, the higher cost of the access to the
bigger, denormalized data may overpower the benefits coming from the
replacement of the multi-table joins with simpler, single-table queries.
To summarize, denormalization is always worth considering and, in
particular situations, it may provide very good results. However,
denormalization is not going to solve all the problems of the world, no
matter whether the columnar stores or row stores are applied. Everything
depends on the data, the queries, and the database used as the choice
between the row- and column-oriented storage is only one of many
database technology aspects.