Hi Amber,
Thank you for providing your thoughts on our solution. I’ll try to address your comments here, and would be more than happy to have a further discussion with you if you would like.
Infobright took the approach of focusing on delivering a high performance, high compression data store that delivers cost effective analytics on high volumes of data.
Now that I’ve got the marketing speak out of the way, let me give you my thoughts on our approach and why I believe that we do support the functionality that you have requested.
First off, full DML support is provided in the IEE version of the product. However, using ETL processes, you can eliminate the need for IUD by undertaking your processing outside of ICE, and then loading the transformed data.
We intentionally built our solution to work within the MySQL environment (we are sort of an engine within MySQL; however, we do have a few differences such as the data loader). This gives us the ability to leverage many of the tools available in the MySQL environment, both from other MySQL partners and MySQL themselves.
And there’s a certain advantage to this: I’ve often seen ‘built in’ functions like backups and monitoring included in offerings that are subsequently never used, since organizations often have another offering already in place. Case in point: most ETL tools now have some pretty good monitoring tools. But in reality, they are rarely used in practice since you are already using tool ‘x’ and don’t need it.
So, for things like backup and restore, you can utilize system tools like some of the Linux backup tools. Or you can use a product like Zmanda (open source).
And for things like data replication and master/slave failover, there are tools like the MySQL proxy server that can support replication, manage heartbeats and failover, and can even ‘police’ queries such that blatantly slow queries (select * against a billion row table, for example) can be blocked. All of these tools are open source and are readily available, with strong community support.
The biggest thing to consider when rolling out an ‘enterprise’ data warehouse is what is really required?
Backups have to be done. Restoring data due to a hardware failure is something that I have rarely had to do. However, restoring data due to human error is far more common (the “Oops, I shouldn’t have loaded that” scenario). And none of the replication software or redundant hardware you can put in place will prevent that.
One of the first things I do is really nail down what the service requirements are. For a DW that’s being used by many people on a 7X24 basis, you do need failover, since loss of the DW likely leads to an immediate cost to the business. However, for a solution being used by a small department, you may not need to have 7X24 availability. In fact, downtime of even a business day may be acceptable! In this case, I would eliminate the redundancy since Infobright runs on ‘off the shelf’ Intel hardware that can be easily acquired – and restoration of the DW can be undertaken quickly (for example, Infobright’s typical compression of a 1 TB database to 100 GB makes the backups and restores quick!).
I’ll come back to a couple of points you raised that I didn’t address earlier (sorry for jumping around):
We don’t provide a custom storage layout function because, quite frankly, we don’t think there is a need for it and it runs counter to our philosophy of a simple data warehouse that can be easily managed. And since, through our compression and knowledge grid, we typically store data in 1/10 of the size of other organizations, the need for determining where to physically put tables is negated.
Regarding scale out support, our solution is designed to ‘ride the hardware curve’, so to speak. We scale out on concurrent queries via the number of cores in a solution. So, if you have a single dual core cpu, you will be limited to two concurrent queries. However, scaling out the cores to eight will allow you to run eight concurrent queries without any impact on performance. We are continuing to work on our concurrent query performance to scale beyond eight concurrent queries without material performance degradation.
I trust that this answers your questions. I would be happy to engage in a further discussion with you at your convenience.
Regards,