Hi folks.
I’m totally new to Infobright and looking forward to start using it. Meanwhile I was wondering about some load speed estimates to know what to expect in production. This sometimes is really crucial.
So, is there anybody out there willing to share some estimates? I think mere GBs, # rows and server used would be enough to avoid total “apples to oranges”.
And to pepper this discussion a bit, I’d post some of my current projects stats (those are “apples”):
1. Loading in Oracle Essbase
GB—~100
# rows—~ 0,5 bln
server—4 CPU, 20 Gb Ram, win64
Load Time: ~40 min (+ aggregation to total 1,5 hour for whole OLAP cube)
I’m using 7 cores at loading in parallel.
2. Loading into Oracle 10g
GB—~180
# rows—~ 0,4 bln
server—4 CPU, 8 Gb Ram, win32 Pae
I’m using 4 cores at loading in parallel, no way to work it up more with 32bit Oracle.
Load Time: ~30 min
I’m not sure, but ICE release may use only one core on load (IEE is multithreaded). The load speed obviously depends on disk speed etc., and additionally on data contents (numerics are faster, texts slower). On average, I would expect something like 50 GB/h on one-threaded ICE.
Thanks a lot.
So I guess 30-40 Gb\h is a good estimate.
IMHO, single-threaded load usually just can’t stress I\O enough, so disk speed becomes irrelevant. CPU speed becomes really important at this point.
Any noted difference between Itaniums or Xeons? Anyone tried ICE on zlinux?
I’m wondering whether we could make a finers comparison. I assume that you might not be able to give us some samples of your data. However, if there are more details you could share with us, it would be appreciated. Also, if you decide to test ICE using your data with respect to the load speed (and not only), we would be of course very happy.
I have also another question:
In the projects you’re referring to, can you observe a constant load speed or are there any slow-downs or speed-ups while adding new portions of the already existing data?
Finally, regarding your question:
ykud - 19 December 2008 05:36 AM
Any plans to start parallel datapumping?
We are aware that such an extension may turn out very useful. We’ll get back to you with an updated information on this topic.
One thing we also have to keep in mind that ICE compresses data during load. So it is apple to orange comparison (assuming you did not enable any type of compression in Oracle during load).
Good point, Mahib! E.g. one of our databases has 1 TB of source data and takes up ~210 GB on disk in ICE. Some guys on the forum report even ~30 times compression if index sizes (no existing in ICE) are counted also.
I’m not sure but I guess that Oracle Essbase compresses data too. Sorry for so many questions but, ykud, could you please let us know whether you used data compression during load? It may be actually interesting to compare compression ratios as well.
I’m doing due diligence on Infobright. I prepared a modified schema (to meet Infobright’s datatype requirements), prepared a load file using current data. Then I loaded up the data first in Infobright (ICE v. 3.1.1). The load took a really long time w/ICE. I know its doing compression but this was so long that it caused us to pause. Here are the details:
ICE v. 3.1.1
MySQL v 5.0.51a (other server for the mysql test)
# of rows: 69838468
avg row lenght: 2
data_length: 192411829
ICE LOAD TIME: Query OK, 69838468 rows affected (37 min 7.82 sec)
MySQL Load time: Query OK, 69838468 rows affected, 65535 warnings (6 min 8.96 sec)
These were run on different servers.
It was suggested I may have hit a bug? should I upgrade my version of ICE?
It looks like a lot of warnings are being generated (65535 warnings). I am guessing that the data doesn’t match your table definition. This could definitely slow down the load.
Can you do a “show warnings;” from the MySQL load, and it may show what’s wrong. The same problem may be affecting the Infobright load.
Normally ICE loads faster that mysql. Apart from that your data does not match table definition, there can be other problems. Not seeing what you are loading, with what command and into what table - I cannot tell anything more…
What is the table definition that you are loading data to? I can see there are in average 2 bytes in row. Does it mean that there is only one column in the table? Is it char(2) or varchar(2)? I am asking because such columns are automatically converted to lookup columns and if there are many distinct values in such column then load may become slow.
Here’s the command to load (same on MySQL as Infobright)
LOAD DATA INFILE ‘/tmp/pub-stats_test.sql’ INTO TABLE publisher_stats FIELDS TERMINATED BY ‘,’;
There is one date field, 12 ints, one each: varchar(2), varchar(13), varchar(12), 4 decimal (18,7) and one datetime. No texts, No large varchar (and the 13, 12 one is originally an enum that I converted to varchar (the data was a string)). Both the MySQL & Infobright have identical schemas. I used the same create table for both.
There is one date field, 12 ints, one each: varchar(2), varchar(13), varchar(12), 4 decimal (18,7) and one datetime. No texts, No large varchar (and the 13, 12 one is originally an enum that I converted to varchar (the data was a string)). Both the MySQL & Infobright have identical schemas. I used the same create table for both.
Are you using lookup fields? If using lookup, make sure there aren’t more than 10000 distinct values in the column. This can slow down the load a lot.