Joinutility seperatorLogin utility separator Infobright.com
   
1 of 2
1
Load Speed
Posted: 18 December 2008 04:46 PM   Ignore ]  
Newbie
Rank
Total Posts:  2
Joined  2008-12-18

Hi folks.
I’m totally new to Infobright and looking forward to start using it. Meanwhile I was wondering about some load speed estimates to know what to expect in production. This sometimes is really crucial.

So, is there anybody out there willing to share some estimates? I think mere GBs, # rows and server used would be enough to avoid total “apples to oranges”.

And to pepper this discussion a bit, I’d post some of my current projects stats (those are “apples”):

1. Loading in Oracle Essbase
GB—~100
# rows—~ 0,5 bln
server—4 CPU, 20 Gb Ram, win64
Load Time: ~40 min (+ aggregation to total 1,5 hour for whole OLAP cube)
I’m using 7 cores at loading in parallel.

2. Loading into Oracle 10g
GB—~180
# rows—~ 0,4 bln
server—4 CPU, 8 Gb Ram, win32 Pae
I’m using 4 cores at loading in parallel, no way to work it up more with 32bit Oracle.
Load Time: ~30 min

“oranges” anyone?

Profile
 
Posted: 18 December 2008 06:27 PM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  736
Joined  2008-08-18

Hi,

I’m not sure, but ICE release may use only one core on load (IEE is multithreaded). The load speed obviously depends on disk speed etc., and additionally on data contents (numerics are faster, texts slower). On average, I would expect something like 50 GB/h on one-threaded ICE.

Regards,

Signature 
Profile
 
Posted: 19 December 2008 05:19 AM   Ignore ]   [ # 2 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

Using ICE (= 1 core) loading TPC-H part table 3 200 000 rows = 390 MB lasted 41 sec, what gives ~35 GB/sec. This table contains 3 varchar fields length 23, 25 and 55, they (their compression) slow down the load.

Profile
 
Posted: 19 December 2008 05:36 AM   Ignore ]   [ # 3 ]  
Newbie
Rank
Total Posts:  2
Joined  2008-12-18

Thanks a lot.
So I guess 30-40 Gb\h is a good estimate.
IMHO, single-threaded load usually just can’t stress I\O enough, so disk speed becomes irrelevant. CPU speed becomes really important at this point.
Any noted difference between Itaniums or Xeons? Anyone tried ICE on zlinux?

Any plans to start parallel datapumping?

Profile
 
Posted: 19 December 2008 05:42 AM   Ignore ]   [ # 4 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

As I recall for numeric data like in TPS-DS it was more than 50GB/h. The numbers I gave were from Core2 Quad 2.4GHz. No idea about Itanium.

Parallel loading works already - in IEE

Profile
 
Posted: 19 December 2008 05:56 AM   Ignore ]   [ # 5 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello ykud,

I’m wondering whether we could make a finers comparison. I assume that you might not be able to give us some samples of your data. However, if there are more details you could share with us, it would be appreciated. Also, if you decide to test ICE using your data with respect to the load speed (and not only), we would be of course very happy.

I have also another question:

In the projects you’re referring to, can you observe a constant load speed or are there any slow-downs or speed-ups while adding new portions of the already existing data?

Finally, regarding your question:

ykud - 19 December 2008 05:36 AM

Any plans to start parallel datapumping?

We are aware that such an extension may turn out very useful. We’ll get back to you with an updated information on this topic.

Best greetings,

Dominik

Signature 
Profile
 
Posted: 19 December 2008 07:31 PM   Ignore ]   [ # 6 ]  
Member
RankRankRank
Total Posts:  218
Joined  2008-08-18

Hi ykud

One thing we also have to keep in mind that ICE compresses data during load. So it is apple to orange comparison (assuming you did not enable any type of compression in Oracle during load).

regards,

Signature 

Mahib

Profile
 
Posted: 20 December 2008 02:23 PM   Ignore ]   [ # 7 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

Good point, Mahib! E.g. one of our databases has 1 TB of source data and takes up ~210 GB on disk in ICE. Some guys on the forum report even ~30 times compression if index sizes (no existing in ICE) are counted also.

Profile
 
Posted: 22 December 2008 11:17 AM   Ignore ]   [ # 8 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello Again,

I’m not sure but I guess that Oracle Essbase compresses data too. Sorry for so many questions but, ykud, could you please let us know whether you used data compression during load? It may be actually interesting to compare compression ratios as well.

Many thanks and best greetings,

Signature 
Profile
 
Posted: 11 May 2009 02:24 PM   Ignore ]   [ # 9 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

I’m doing due diligence on Infobright. I prepared a modified schema (to meet Infobright’s datatype requirements), prepared a load file using current data. Then I loaded up the data first in Infobright (ICE v. 3.1.1). The load took a really long time w/ICE. I know its doing compression but this was so long that it caused us to pause.  Here are the details:

ICE v. 3.1.1
MySQL v 5.0.51a (other server for the mysql test)

# of rows: 69838468
avg row lenght: 2
data_length: 192411829

ICE LOAD TIME:  Query OK, 69838468 rows affected (37 min 7.82 sec)
MySQL Load time: Query OK, 69838468 rows affected, 65535 warnings (6 min 8.96 sec)

These were run on different servers. 

It was suggested I may have hit a bug? should I upgrade my version of ICE?

thanks,
erin

Profile
 
Posted: 11 May 2009 02:49 PM   Ignore ]   [ # 10 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  505
Joined  2008-08-18

Hi Erin,

It looks like a lot of warnings are being generated (65535 warnings). I am guessing that the data doesn’t match your table definition. This could definitely slow down the load.

Can you do a “show warnings;” from the MySQL load, and it may show what’s wrong. The same problem may be affecting the Infobright load.

Thanks
Geoffrey

Signature 
Profile
 
Posted: 12 May 2009 04:11 AM   Ignore ]   [ # 11 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  916
Joined  2008-08-18

Hi!

Normally ICE loads faster that mysql. Apart from that your data does not match table definition, there can be other problems. Not seeing what you are loading, with what command and into what table - I cannot tell anything more…

Profile
 
Posted: 12 May 2009 07:48 AM   Ignore ]   [ # 12 ]  
Jr. Member
RankRank
Total Posts:  87
Joined  2008-08-18

Hi Erin,

What is the table definition that you are loading data to? I can see there are in average 2 bytes in row. Does it mean that there is only one column in the table? Is it char(2) or varchar(2)? I am asking because such columns are automatically converted to lookup columns and if there are many distinct values in such column then load may become slow.

Cheers

Profile
 
Posted: 12 May 2009 12:40 PM   Ignore ]   [ # 13 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

I fixed the schema, dumped the old data & reloaded into both the MySQL 5.0.51a instance and the Infobright 3.1.1 (diff servers).

The time saved is small.

Infobright:
Query OK, 69838468 rows affected (25 min 0.52 sec)
Records: 69838468 Deleted: 0 Skipped: 0 Warnings: 0

MySQL:
Query OK, 69838468 rows affected (5 min 46.19 sec)
Records: 69838468 Deleted: 0 Skipped: 0 Warnings: 0

I’m going to test this on another server to see if that will bring the Infobright load time down, but it concerns the folks higher up.

erin

Profile
 
Posted: 12 May 2009 12:45 PM   Ignore ]   [ # 14 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

Here’s the command to load (same on MySQL as Infobright)

LOAD DATA INFILE ‘/tmp/pub-stats_test.sql’  INTO TABLE publisher_stats FIELDS TERMINATED BY ‘,’;

There is one date field, 12 ints, one each: varchar(2), varchar(13), varchar(12), 4 decimal (18,7) and one datetime. No texts, No large varchar (and the 13, 12 one is originally an enum that I converted to varchar (the data was a string)). Both the MySQL & Infobright have identical schemas. I used the same create table for both.

erin

Profile
 
Posted: 12 May 2009 01:00 PM   Ignore ]   [ # 15 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  505
Joined  2008-08-18
eonarts - 12 May 2009 12:45 PM

There is one date field, 12 ints, one each: varchar(2), varchar(13), varchar(12), 4 decimal (18,7) and one datetime. No texts, No large varchar (and the 13, 12 one is originally an enum that I converted to varchar (the data was a string)). Both the MySQL & Infobright have identical schemas. I used the same create table for both.

Are you using lookup fields? If using lookup, make sure there aren’t more than 10000 distinct values in the column. This can slow down the load a lot.

Signature 
Profile
 
   
1 of 2
1