i’ve got problem about load data.
usualy load data process is fast enough, but after i rewrite my table (drop then write again) twice, suddenly load data process is being slowly.
the size file only 1.2GB while the load data until 32 minute, it hasn’t been complete.
We haven’t observed similar problems before. We would like to help you but we need more information.
Do I understand correctly that you are loading the same file to the table with the same schema?
Didn’t you do similar operation (dropping and creating again) with other tables?
How long did it take to load the data before recreating the table?
How many IB tables do you have and how many columns is in the table you were trying to rewrite?
Could you try to restart ICE and repeat the same procedure. If it help then it will help us to find the cause.
Can you put your DROP TABLE, CREATE TABLE, LOAD DATA statements into a script and submit to the forum? My understanding is that if you run the script once, the load works fine. If you run the script again (without any change to it, without any other action and using the same load data) the load becomes slower - is that true?
First, I want to explain that my table consists of 10 files (sum of record are 62 million). So, it needs
10 times of file load to enter the data into the table.
At the first attempt, it was a success. The whole files were entered
successfully to the table. But then I needed to drop the table because
there was something wrong on the content of the data. After the table had
been dropped, I recreated the table and loaded the data. Again, it was a
success. Unfortunately, with the same reason as before, I needed to drop
it again.
At the third attempt, the load process was failed. From 10 files that
should have been loaded, only 1 file was successful. When it tried to load
the second file, it never completed.
the table has 44 columns. but after I rechecked my data (before load to the database) there was error. it wasn’t match with table datatype.
I think I have the same problem that is ICE load data too slow. After I upgraded to ICE 3.1 all things so Ok (so fast and very stable). Last week, I upgraded to ICE 3.1.1 too and ICE has no any problems before. But right now, I the problem occur, Load data very slow:
ex: before, Load Data InFile statement very fast, It take about <100ms per files (the sample file was attached in this post).
Right now, It take about more than 2s per files. I tried to reset the server, but Load data speed was not fast as I expected.
more info: I scheduled load data in 1 minute per time, and my load data infile script look like:
LOAD DATA INFILE '/glfs/adsdata//adv1241406361871.dat' INTO TABLE ad0409 FIELDS TERMINATED BY ' ' EN
I’m currently using lastest of ICE (revision=IB_3.1-1_r3718_3855(ice))
OS: Linux localhost 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
RAM: 48 Gb
processor : GenuineIntel Intel(R) Xeon(R) CPU X5550 @ 2.67GHz [GenuineIntel 16 cores Family 6 Model 26 Stepping 5]
For more detail about the problem issue, I attached the full log files, including: bh.err, brighthouse.ini, and sample file was used in load data infile script.
There are possibilities of:
1) corruption of database, can you try a fresh install, which wipe out all old database.
2) disable selinux and try again? Because we see similar Amazon EC2 problem.
3) reduce memory setting from 34000 -> 24000 to free more memory
4) some bad data, and bhloader not handling those bad data very well.
All lookup columns in my table were char and varchar(have got 7 lookup columns/25 total of columns), and the distinct values less than 5000 (In fact, I had read the article http://www.infobright.org/Forums/viewthread/88/ before the table was designed, then I think the lookup columns were not the main reason make the load data command too slow).
For more detail about the problem, I attach the brigthouse.log in this post.
(Right now, I tried to load data to another table (a new table with the same schema), load data too slow again)
As Jakub pointed out in your logs there are plenty of “Error: Wrong data or column definition…” error messages. Such message means that the load failed. The other thing is that I can see up to 3 such messages with in the same second. Does it mean that you for instance load the same file to 3 different tables in parallel?
I agree with you about “Wrong data in file may result in a slowdown, or even hang.”, I will fix the error after. But now, I can’t load data into my table (the ad0409 table), after load fail I saw some error in the brighthouse.log, especially :
Error: Expression evaluation failed!
that I can’t find any the same error in this forum at all.
Right now, I have to load data into another table (I created a new table with the same schema of ad0409), and with the same files, load data in to new table was fine, but with the ad0409 table was fail, always can’t to load.
I had read the article How does the Knowledge Grid handle corruption, so I’m very worried about my KG was corruption and no way to fix it. In this case, how can I check my KG status?
for more about fail to load data : currently the data size of the ad0409 in hard disk are about 20Gb. And the loader load files from mount disk (NFS mount).
I come back with the same problem. I have 4.3 GB data in the table then I try to Select data like these
Select callingnumber,callednumber,nodeid,count(callingnumber) from ascii2 where callingnumber=‘123456789101’ group by callingnumber,callednumber,nodeid;
It needs time to show the result 29 min 49.13 sec.
Every time you have a question about query performance, try setting “ControlMessages = 2” in “brighthouse.ini”. Then restart server and execute the queries again. A log file bh.err will be created next to the brighthouse.ini file. This log file will contain information how the queries have been processed. Please post this file, so we can see how each of the queries has been handled by the engine. Then some conclusions on the resulting performance could be derived.
However, from your query I can guess you are grouping on 2 x CHAR(12) + INT + counter + data structure overhead, so you need around 60 bytes per group. How many groups you have, how much memory you have, how big is your MainHeapSize? Maybe the result is not fitting in the memory and multipass processing or swapping occurs.
Why you do not store phone numbers as integers - it is more efficient.
Indeed, “bh.err” log should help us a lot. BTW, “count(callingnumber)” is a bit superfluous here, you can safely use “count(*)”. Note that these are equivalent as callingnumber is not null inside groups (nulls will fall to a separate group anyway). “Count(*)” is faster.