Should MySQL Users Do Data Warehousing Differently?
by
John Kemp
Wed, Nov 12, 2008
Victoria Eastwood recently blogged about how MySQL users do Data Warehousing. In her blog, she spoke about how MySQL users typically eschew things like ETL tools in because 1. source systems have become simpler and cleaner and 2. time to develop precludes learning a new tool.
While I agree that, for a simple DW project with few data sources, you can bang out the transformation and load routines using the language of choice, I don't recommend hand coding anything more complex than that. Why?
1. Error handling - It becomes much more difficult to manage and build routines by hand to handle all of the error conditions that you can encounter. ETL tools provide a lot of this stuff out of the box.
2. Documentation - An ETL tool generates documentation as you build transformation routines - this makes future maintenance easier.
3. Portability - a good ETL tool supports multiple platforms. You can move your transformations from one OS to another.
If you use your Data Warehouse for any external reporting, you must be able to clearly articulate how the report figures were generated - this includes being able to provide clearly documented transformation routines. Sorbanes-Oxley provides for jail sentences for executives who cannot document the source of their corporate reporting - this might be an extreme example, but it's real and it's something to consider in anything you build.
People knock the speed of ETL tools. I'm not sure that's true anymore - we've built an ETL process that transforms 140 million source records in a few hours. And this is not a trivial transformation - data must be grouped together based on criteria and then transformed in a many to one transformation. We use one of the open source solutions (I won't say which).
I guess all that I suggest is that you consider who your end audience is for whatever transformation that you build. If it's just yourself or your group, then by all means write your own code. If the application has a broader audience and use, do consider an ETL tool.
This is very good advice--
When we fist started moving our ETL from hundreds of Perl scripts to a ETL tool (Talend in this case) I was frustrated by the learning curve and the associated loss of productivity. After getting to know the tools however, I am a huge fan of using them for the same reasons you lay out here: consistancy, documentation and support.
I must say that I have seen a few instances where ETL tools either force you into an awkward implementation or lead to a inefficient design. In these cases I used a combination of custom code and the tool.
Post Comment