Joinutility seperatorLogin utility separator Infobright.com
   
2 of 2
2
Load Speed
Posted: 12 May 2009 01:05 PM   Ignore ]   [ # 16 ]  
Jr. Member
Avatar
RankRank
Total Posts:  96
Joined  2008-08-18

eonarts,

Please post some sample data and your full/complete table definitions.

Also, you should have the “enclosed by” clause.  For example, enclosed by nothing is:

LOAD DATA INFILE '/tmp/pub-stats_test.sql'  INTO TABLE publisher_stats  FIELDS TERMINATED BY ',' ENCLOSED BY 'NULL'

Any other information you can provide us will help us help you faster.

eonarts - 12 May 2009 12:45 PM

Here’s the command to load (same on MySQL as Infobright)

LOAD DATA INFILE ‘/tmp/pub-stats_test.sql’  INTO TABLE publisher_stats FIELDS TERMINATED BY ‘,’;

There is one date field, 12 ints, one each: varchar(2), varchar(13), varchar(12), 4 decimal (18,7) and one datetime. No texts, No large varchar (and the 13, 12 one is originally an enum that I converted to varchar (the data was a string)). Both the MySQL & Infobright have identical schemas. I used the same create table for both.

erin

[ Edited: 12 May 2009 01:15 PM by Brian Beharry]
Signature 

Brian Beharry, QA
Infobright

Profile
 
Posted: 12 May 2009 04:11 PM   Ignore ]   [ # 17 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

Here’s a sample FROM the infile:
2007-09-24,312,AL,162,0,192,122,368,3,0.0000000,0.0150000,0.0150000,\N,0,\N,\N,0,0,0,0,\N
2009-05-04,2295,VI,231758,1,39,0,0,2,0.0000001,0.0000500,0.0500000,\N,0,\N,\N,0,0,0,0,\N
2009-05-04,2295,VI,244994,0,39,0,1,0,0.1000000,0.0000050,0.0000050,\N,0,\N,\N,0,0,0,0,\N

It looks like two of my varchars are actually always null…..

the data doesn’t seem like it should take that long to load and it doesn’t in MySQL ....

Profile
 
Posted: 12 May 2009 05:35 PM   Ignore ]   [ # 18 ]  
Jr. Member
Avatar
RankRank
Total Posts:  96
Joined  2008-08-18

Eonarts,

I used your data and tried to guess your table definitions; and I came up with this script which I think you should look at:

warnings;

drop database if exists eonarts_database;
create database eonarts_database;
use 
eonarts_database;

set autocommit=1;


drop table if exists eonarts_table;
CREATE TABLE eonarts_table(
  
column1 DATE,
  
column2 BIGINT,
  
column3 VARCHAR(5532),
  
column4 BIGINT,
  
column5 BIGINT,
  
column6 BIGINT,
  
column7 BIGINT,
  
column8 BIGINT,
  
column9 BIGINT,
  
column10 DECIMAL(3,3),
  
column11 DECIMAL(3,3),
  
column12 DECIMAL(3,3),
  
column13 VARCHAR(5532),
  
column14 DECIMAL(3,3),
  
column15 VARCHAR(5532),
  
column16 VARCHAR(5532),
  
column17 DECIMAL(3,3),
  
column18 DECIMAL(3,3),
  
column19 DECIMAL(3,3),
  
column20 DECIMAL(3,3),
  
column21 VARCHAR(5532)
engine=brighthouse CHARACTER SET ascii COLLATE ascii_bin;


drop table if exists eonarts_table_mysql;
CREATE TABLE eonarts_table_mysql(
  
column1 DATE,
  
column2 BIGINT,
  
column3 VARCHAR(5532),
  
column4 BIGINT,
  
column5 BIGINT,
  
column6 BIGINT,
  
column7 BIGINT,
  
column8 BIGINT,
  
column9 BIGINT,
  
column10 DECIMAL(3,3),
  
column11 DECIMAL(3,3),
  
column12 DECIMAL(3,3),
  
column13 VARCHAR(5532),
  
column14 DECIMAL(3,3),
  
column15 VARCHAR(5532),
  
column16 VARCHAR(5532),
  
column17 DECIMAL(3,3),
  
column18 DECIMAL(3,3),
  
column19 DECIMAL(3,3),
  
column20 DECIMAL(3,3),
  
column21 VARCHAR(5532)
engine=myisam CHARACTER SET ascii COLLATE ascii_bin;


--  
set @bh_dataformat 'txt_variable';
LOAD DATA INFILE "/home/mysql/bbeharry_temp/eonarts_data.txt" INTO TABLE eonarts_table       FIELDS TERMINATED BY ',' enclosed by 'NULL' ESCAPED BY '\\' LINES TERMINATED BY '\n';
LOAD DATA INFILE "/home/mysql/bbeharry_temp/eonarts_data.txt" INTO TABLE eonarts_table_mysql FIELDS TERMINATED BY ',' enclosed by ''     ESCAPED BY '\\' LINES TERMINATED BY '\n';


select from eonarts_table       limit 10;
select from eonarts_table_mysql limit 10
Signature 

Brian Beharry, QA
Infobright

Profile
 
Posted: 12 May 2009 05:54 PM   Ignore ]   [ # 19 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  763
Joined  2008-08-18

Hi Brian,

DECIMAL(3,3) is not so good for this sample. Numbers like 0.0000050 should rather be stored in DECIMAL(7,7), or (safer) DECIMAL(10,7). Note that the first parameter in DECIMAL does not affect performance, only the second one does, but it is usually defined by actual data, like in this case.

On the other hand, VARCHAR(5532) looks a bit over-volumed. It should be fine tuned to real needs, as the query performance (and load speed) may depend on it.

Regards,

Signature 
Profile
 
Posted: 12 May 2009 08:54 PM   Ignore ]   [ # 20 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

Plus I don’t need bigints! really my schema is NO longer giving me warnings about my data. Once I got those to stop I did another test with the proper schema sizings and STILL it took too long for the Infobright (25MIN!) & about 5min+ for MySQL.

There is nothing unusual about the schema or the data. I don’t even have more than a varchar(2)! I mean really, this table should not be a problem.

erin

Profile
 
Posted: 13 May 2009 01:47 AM   Ignore ]   [ # 21 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  763
Joined  2008-08-18

Hi erin,

Would you, please, do an experiment and change all char/varchar(1) or char/varchar(2) into char/varchar(5)?  I’m trying to guess the reason.

BTW, bigint should perform just like ints, both in loading and queries.

Regards,

Signature 
Profile
 
Posted: 13 May 2009 04:02 PM   Ignore ]   [ # 22 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

?? I don’t have any varchar(1). I have country varchar(2) & 2 other varchars that are (12) & (13).

erin

Profile
 
Posted: 14 May 2009 11:23 AM   Ignore ]   [ # 23 ]  
Jr. Member
Avatar
RankRank
Total Posts:  96
Joined  2008-08-18

Erin, please provide:

1) your full/complete create table statements
2) how you set autocommit
3) your current full/complete load statement
4) Also, please confirm with us that nothing else is running on your system when you load into Infobright and nothing is using the disk at the same time.  Normally, you can have other things going on, but we’re trying to figure out what’s happening on your system.

Thanks.

Signature 

Brian Beharry, QA
Infobright

Profile
 
Posted: 27 May 2009 02:10 PM   Ignore ]   [ # 24 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

I found the answer to my long load times! whoa. The first load was done on a server with SATA disks with 2 spindle and mirrored. The mysql load happened on a server with SAS disks (6 spindles RAID 10). I then took the same schema, data, scripts & loaded it onto a beefier server with SAS disks (8 of them RAID 10 ). And it was nite & day!

Sample:
My first load into infobright:  11,629,420 rows affected (46 min 39.54 sec)
vs. load into mysql:  11,629,420 rows affected (2 min 30.66 sec)

Then the newer server: 11,629,420 rows affected (3 min 9.98 sec)

I know that Infobright says that you can use SATA disks but I find the load times too discouraging on them. Using a standard database type server the load was so close to MySQL that it makes Infobright look very promising.

erin

Profile
 
Posted: 27 May 2009 02:15 PM   Ignore ]   [ # 25 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  738
Joined  2008-08-18

Erin,

Glad to hear you found a workable solution.  In my experience, SATA drives are slow (typically spinning at 7,200 RPM) and have a short MTBF.  The tradeoff that often justifies them is their inexpensive cost.  It’s really a cost/benefit decision.

Best of luck going forward.

Signature 
Profile
 
Posted: 09 June 2009 01:36 PM   Ignore ]   [ # 26 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

I’ve now tested loading the same dummy data into the same schema on a number of servers, both in MySQL and Infobright & here’s what I discovered:
  number of spindles matter. SAS vs SATA does not make a difference with Queries. Loading was not affected by the type either, but by the number of disk spindles rather than SAS vs SATA.

(all infobright)
vm w/2GB of RAM, 2 disk spindles on SATA 17 GB file load time: 2 hours 13 min 34.24 sec
vm w/8GB of RAM, 6 disk spindles on SAS 17 GB file load time: 45 min 18.38 sec

Another test of 305M INFILE:
vm w/8GB of RAM, 6 disk spindles on SAS 305M file load time: 34.67 sec (Infobright)
server w/24GB of RAM, 8 disk spindles on SAS 305M file load time:21.46 sec (Infobright)
server w/24GB of RAM, 8 disk spindles on SATA 305M file load time: 21.51 sec (Infobright)
server w/24GB of RAM, 8 disk spindles on SATA 305M file load time: 12.11 sec (MySQL into MyISAM table)

Disk footprint was 157M (Infobright) vs 1.4 G (MySQL MyISAM tables). The data was almost all INTs, with a couple of Decimals. No CHARs or VARCHARs.

The larger the INFILE the longer the time to load. Keeping the INFILE under a GB is probably the best answer.

Profile
 
Posted: 09 September 2009 03:53 AM   Ignore ]   [ # 27 ]  
Newbie
Rank
Total Posts:  42
Joined  2009-03-26

Hi all,

Today I hear about the Load speed from SSIS (in SQL Server 2008), very very fast. In this test, they expected about 2.36TB per hour.
Of course, data may not be compressed in the database like ICE did. Please read the article for more detail:

http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

Signature 

Thanh Nguyen

skype: thanhntvt
Website: Admicro.vn

Profile
 
Posted: 09 September 2009 01:07 PM   Ignore ]   [ # 28 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

SQL Server so does NOT meet our needs, for one it is NOT open source and does not scale well on commodity hardware. In the web world, most of the databases in use are MySQL. When it comes time to pay for a database most folks go to Oracle. Infobright, as a plugin engine to MySQL meets a lot of small dataware house needs. The data compression with infobright alone saves money that can go towards SAS disks.

Profile
 
Posted: 05 November 2009 04:39 PM   Ignore ]   [ # 29 ]  
Jr. Member
RankRank
Total Posts:  78
Joined  2009-04-28

Not sure what the question is? MySQL is free & Oracle costs a lot of money. Prior to ICE release, many companies had to go to Oracle for OLAP *if* they could afford it. Many companies could not. Now there’s ICE & I’m really finding it a great product, even with its restrictions.

Profile
 
   
2 of 2
2