Prometheus,
Thanks for joining the Infobright community and posting the details of your opportunity. Welcome! I’m happy to respond to your questions and will do so inline with your original message.
*Fast data loading. How fast we load data to the Infobright ? Is it could be multithreaded data loading possible?
Infobright has the highest load speed per data server and it scales very well with multiple, parallel load processes. The following values are for a single SMP server, therefore my reference to “per data server”. For example, we have seen load speeds of 100GB/hour for a single table load process. When we increased the number of tables being loaded to two, we saw two individual throughput speeds of 90GB/hour/table, or 180GB/hour. With three tables and load processes, 80GB/hour/table, or 240GB/hour. Likewise at 4 concurrent load processes, 70GB/hour/table or 280GB/hour at the aggregate level.
This test was performed with the Infobright Enterprise Edition which is multi-threaded. The Community Edition, or ICE, is single-threaded so performance levels vary depending on table structures and the actual data, but are somewhere in the neighborhood of 50% to 60% of the multi-threaded Infobright Loader.
*Is it possible to MPP ? can we add more server to share cpu resources and decrease the load time ?
Infobright is built on a a single, SMP architecture. Think of this as a “cluster of one” with a shared disk MPP architecture. Resources can be added to the single server - such as CPUs and memory - to increase all aspects of performance. We are currently in Design and Development of a multi-server version of this architecture that will remain a shared disk MPP architecture. This will allow the incremental expansion of “front-end” resources for user connections and compute power as you suggest. It also facilitates scaling the computing capacity separately from the storage capacity, as opposed to a shared-nothing MPP architecture. The timeline for delivery of this architecture is still being evaluated.
What is the best HW solution for above first customer ? Do we need still fat enterrprise servers or what?
Infobright is agnostic on specific hardware platforms. Infobright supports both Intel and AMD 32- and 64-bit chipsets and multiple varieties of Linux.
For your first customer, let me make sure I understand the basics: They will load 1TB once daily and then run analytical queries against it. On subsequent dates, the previous data will be dropped (or otherwise unloaded from Infobright) and the next day’s data will be loaded. The storage and query requirements are all very well-aligned with ICE’s capabilities. I’m sure you will see this in your experience.
The critical component here would seem to be loading 1TB in no more than 2 hours, or 500GB/hour. As you have probably seen in the previous response on loading, the “scale factor” is the number of parallel load processes that are occurring at any time and the aggregate throughput. (I would fist want to make sure the server could “deliver”, or read, the source data at that rate.)
We haven’t tested beyond 4 parallel loads on a dual CPU/dual core processor (or 1 core/load process) but if I extrapolate the 10% single table throughput rates, I would expect to see approaching 400GB/hour with 8 parallel load processes. Is the database design such that 8 parallel loads could be executed? In this case, I would recommend a minimum of 8 CPU cores and no less than 32GB of RAM. More of both would be beneficial.
As always, your results will depend on the data model, the data, achieved compression ratios and external factors such as how fast the source data can be “fed” to the Infobright Loader and what other loads are on the server at that time.
So can we do this with Info Bright community edition?
From the information you’ve provided, I think Infobright would be a very good fit for your proposed projects. Please give it a test and let us know how it works for you!
Also is there a Mac OS X source for compiling?
Infobright has not yet developed a Mac OS X port yet but the 64-bit source code is available for testing compilation on Intel-based Macs. Again, if you attempt this, please let us know your results.
Also I couldn’t find 32 bit Linux source code in the site
We are looking into this right now. Please check back for an authoritative response from our Community VP.
Best wishes!