I was attending a meeting that included some MySQL people and it raised some interesting questions.
The general comment was that users of MySQL mostly use PHP, Perl, Ruby on Rails and other scripting tools as their programming language. And they are really not interested in learning new languages for specialized purposes.
So this got me thinking, how would the average MySQL guy prefer to implement a data warehouse. Generally speaking, at least in my experience, when faced with a data warehouse project, most organizations buy a data warehouse ecosystems -- an ETL tool, a database and one or more BI tools. And then you need to learn a new ETL language and in a sense a BI Language (configuration etc.). So I’m not sure how well this will fit with most MySQL users.
These organization were large corporations, with lots of old wonderful systems (that no one understands any more) as their source systems. In addition, the implementation team was not versant in PHP or Perl. The companies had deep pockets and their project costs were well into the millions.
In fact, the costs of traditional warehouse project were quite prohibitive to small to medium sized companies with any volume of data.
Now we have ICE (along with open source ETL and BI tools), so this barrier to entry no longer exists. So how will MySQL users approach the warehouse problem? Well based on the comment above, I would have to assume that they will not be interested in trying out ETL tools and will likely implement their ETL requirements via their favorite scripting language.
Don’t get me wrong, I’m not saying that ETL tools don’t offer value. That’s simply not the case. But its just not the nature of the MySQL community.
Regardless, of how you implement the ETL process; the rules don’t change. Check out John’s blog on the subject (http://www.infobright.org/Blog/Entry/extract_transform_load_what_not_to_dofrom_someone_who_has_done_them_all/). And if you ignore everything else -- do not ignore the recommendation to profile your data. In my opinion, this will burn you every time! No matter how new the source system may be -- you will find data quality issues. Expect it and you will have much greater success.
Comments (3)
I thought I would take a break from working on the ETL Do's blog and spend a little time talking about some DW good reads.
I spent some time and asked around the office here what books people liked (I won't mention the don't likes). There was a strong uniform consensus that any of the Ralph Kimball books are a good read, but there was a lot of passionate 'read this' responses for Kimball's 'The Data Warehouse ETL Toolkit' (available on Amazon).
If you are trying to educate an executive on Data Warehousing, a decent starting point is Bill Inmon's 'Building the Data Warehouse' - it's a pretty easy read and will help someone get up to speed on the basics quite quickly (of course, you could also attend one of the Infobright Introduction to Data Warehousing webinars - they're held every two weeks and are free to all).
For those of us who are getting into the details, one of the best books you can purchase is Jonathan Gennick's 'SQL Pocket Guide'.
Please respond with any recommendations of good DW reads...
Cheers for now