AIXELSYD is DYSLEXIA backwards.
This is not, of course, a joke at the expense of those afllicted with this condition. Rather, it is one way to present the same "thing" from a different perspective. Or a different set of rules, if you will.
And that is my segue into the topic of database replication.
Everyone has an opinion about replication but it is shaded by a perspective, usually from a point in time, but just as often from the point of previous experience, and comes with a set of expectations based on a fixed set of rules. My intent in this post is to take an historical perspective and overlay that on top of modern-day analytic databases - the database software, not the database, necessarily - and, perhaps, reconsider some of the old rules.
The history of database replication comes from the write-oriented OLTP application world. Its initial intent was to support read-oriented uses by separating read operations against a copy (replication) of the database from the required write operations of the application that supported the business. You can call it a form of manual load balancing. It eventually expanded into use as an approach to high-availability, failover and disaster recovery because it basically worked for those uses.
This was facilitated by “forwarding” (publishing) the transaction log of the original, write-oriented database (the publisher) to a separate server (the subscriber) where the transactions were re-played asynchronously to keep the two systems synchronized in near real-time. But Infobright is already a read-optimized database technology, by design - ie. the "target" read database. Additional features such as “catch up”, etc. were added necessarily to ensure that the two (or more) databases were always, eventually, identical.
Bear in mind that this is a transaction replication approach.
However:
(Different flavors – such as two-phase commit (2PC), SQL shipping, etc. – and approaches – enterprise application integration (EAI) and information buses by the likes of IBM (WebSphere) and TIBCO – were also developed and deployed.)
But imagine what would happen if Infobright enabled transaction log replication with unlimited log buffers and actually logged bulk load operations. Every value written to a Data Pack would also be written to the log buffer for forwarding – in uncompressed form.
Now let’s imagine a conservative generic 10:1 compression ratio across all data. That means that all data loaded would be written in compressed form at 10% of its raw data size and then the uncompressed values would be written to the transaction log at 100% of its raw size. That’s 110% the size of the raw data! Instead of loading compressed data faster than writing the raw data, it would take 11 times longer (110% vs 10%) to write all the necessary bytes to disk than it does without writing to the transaction log. That means dramatically slower LOAD speeds than Infobright users currently experience.
It also means going through the entire database server's software stack instead of bypassing it as the bulk loader does now – that’s what the native loader does. At a much slower rate. And that doesn’t even count the overhead on the server of maintaining and shipping the transaction logs. And it also assumes the unrealistic premise of log buffers of unlimited size.
For all of these reasons I believe hardware-level, block-device, synchronous replication is a much better approach. Check out the postings in our Forums around DRBD and other lower-level file synchronization techniques, including eonarts' "poor man's replication" technique.
It might not be 20/20 vision but that's how I see it.
Post Comment