Joinutility seperatorLogin utility separator Infobright.com
   
 
Data Load Time
Posted: 10 March 2010 11:20 AM   Ignore ]  
Newbie
Rank
Total Posts:  12
Joined  2010-03-09

We are loading a 43 MB file containing 2 million rows with just 3 columns - all integers. The following is the syntax we used

LOAD DATA LOCAL INFILE 'D:\DatasetFact.txt' INTO TABLE DatasetFact FIELDS
 TERMINATED BY 
',' LINES TERMINATED BY '\n'

The following are some Sample rows

15253,89,226338677
15253,165,226340065
15253,231,226342095

The following is the table script

CREATE TABLE `entitydatasetpairs` (
  `
EntityKeyint(11NOT NULL,
  `
CategoryKeyint(11NOT NULL,
  `
LogicalPrimaryKeyint(11NOT NULL
ENGINE=BRIGHTHOUSE DEFAULT CHARSET=latin1

This is running on the database server which has 12 GB of memory with a Quad Core processor running Windows Server 2008 Standard Edition.

To load this 2 million rows it has taken 31 minutes.  This seems like a very slow.

Is there something we can do to improve the performance of this loading.

Thanks
Subu

Profile
 
Posted: 10 March 2010 11:30 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  738
Joined  2008-08-18

Subu,

Can you please respond with your Edition and release/version of Infobright?  For example, Infobright Enterprise Edition version 3.3.0.

Signature 
Profile
 
Posted: 10 March 2010 12:42 PM   Ignore ]   [ # 2 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  735
Joined  2008-08-18

Hi,

Please try not to use “LOCAL”, i.e. to load it on the same machine the server is running. LOAD DATA LOCAL INFILE is not optimized. If you’re using IEE, please also provide explicitly your data format (e.g. “set @bh_dataformat=‘txt_variable’;”).

Regards,

Signature 
Profile
 
Posted: 11 March 2010 03:10 AM   Ignore ]   [ # 3 ]  
Newbie
Rank
Total Posts:  12
Joined  2010-03-09
David Lutz - 10 March 2010 11:30 AM

Subu,

Can you please respond with your Edition and release/version of Infobright?  For example, Infobright Enterprise Edition version 3.3.0.

The version I am using is 3.3.1 IEE Eval.

I am just evaluating InfoBright.

Profile
 
Posted: 11 March 2010 03:12 AM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  12
Joined  2010-03-09
Jakub Wroblewski - 10 March 2010 12:42 PM

Hi,

Please try not to use “LOCAL”, i.e. to load it on the same machine the server is running. LOAD DATA LOCAL INFILE is not optimized. If you’re using IEE, please also provide explicitly your data format (e.g. “set @bh_dataformat=‘txt_variable’;”).

Regards,

We have tried running the command after removing the LOCAL keyword. It is still taking 31 minutes for the same file.

Profile
 
Posted: 11 March 2010 04:56 AM   Ignore ]   [ # 5 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  912
Joined  2008-08-18

Hi !

Before loading, did you issue

set @bh_dataformat=‘txt_variable’

But even without this, 43 MB in 31 minutes is perhaps the slowest result I have heard of. Such slow execution can be caused by some misconfiguration or system load. Could you repeat the load, and show the process list in Task Manager - how much CPU and memory bhloader and other processes are using ? Sort the list once on CPU%, once on Mem Usage.

Profile
 
Posted: 11 March 2010 08:31 AM   Ignore ]   [ # 6 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  738
Joined  2008-08-18
Subu - 11 March 2010 03:10 AM

The version I am using is 3.3.1 IEE Eval.

Subu,

Thanks for that information.  I asked because of the differentiating features between the Community Edition and the Enterprise Edition that involve data loading.  Just so you know, there is a special Forum for evaluators here that gets the immediate attention of our Sales team members and also identifies which Edition of the product to which you are referring.

Janusz is correct (of course!) that you must set the @bh_dataformat variable in IEE as he describes.  Just so you know, this signals the database server to execute the Infobright High Performance Loader (process name bhloader) instead of using the default MySQL Loader.  This can result in multiple orders of magnitude performance gain during data loading.

This is described thoroughly in Chapter 7, Importing and Exporting Data in Infobright, on pages 45 and 46 of the 3.3.1 version of the Infobright User Guide which you should have access to along with the IEE Evaluation copy.

Best wishes and good luck.

Signature 
Profile
 
Posted: 11 March 2010 08:35 AM   Ignore ]   [ # 7 ]  
Newbie
Rank
Total Posts:  12
Joined  2010-03-09

Thanks for all your replies. That worked. That file now got loaded in just 3 minutes.

We now tried a text file that is around 14 GB. We are now getting an error message saying

ERROR 5 (HY000): Load process was terminated abnormally. This could happen if so
meone killed the bhloader process, killed the running load or due to unexpected
internal error inside bhloader process.

The CPU usage on the server never went beyond 30 percent.  We checked the memory consumed by bhloader.exe in the task manager and noticed that it did not exceed 900 MB even though the server has got 8 GB RAM available out of the total 12 GB.

We then checked the configuration file for Infobright. The value set for LoaderMainHeapSize was 800 MB which was the default setting. We increased the value to 8000 MB and ran the loader process again, the memory consumed by bhloader process went up to 4GB but then the same error happened.

The following are the other settings on the infobright configuration settings file.

ServerCompressedHeapSize= 500
ServerMainHeapSize= 6000

This new input file is for a different table and it has got more than 275 million rows.

Can you please suggest how we can get this huge file loaded.

[ Edited: 11 March 2010 08:48 AM by Subu]
Profile