In my db I have 300 M records(BRIGHTHOUSE engine).
When I try to run query with group by
e.g. “select field from tabl_name group by field”
I’m waiting 2 min before I receive the result.
Please tell me how to speed up queries with group by.
Thank you.
Thanks to a great help from the community users, we were able to identify several drawbacks in 3.2 RC1, such as:
—count distinct (also with aggregations) —aggregations on the joined data tables —aggregations involving lookup columns —aggregations on time-related columns —....
Most of these things are already fixed and in the middle of testing. Performance of aggregations in the next 3.2 version will be more stable. However, there is no guarantee that the list of drawbacks is already complete. Therefore, I’m very interested in hearing more from Agunko and others.
OK, today I tried a very large group by query under 3.2rc1, and the db box got very unresponsive, with huge amounts of IO wait. At the point that we killed the query it had created over 200G of temp files in the /var/lib/mysql/etc/cache directory. Does this sound like it could running into the known issue of 3.2rc1 with group by performance?
The data sets involved are very large. One table is around 300 million rows, and the smaller is around 1.1 million rows. The tables are being joined on integer columns. I would estimate the result of the query should be approx 20,000 groups, with counts ranging from the many hundred million to the few thousand. Still over 200G of temp files in the cache directory seemed excessive.
Here is the actual query:
select section,pos,count(section) scount from spoof_2_section s2s join oas_log ol on s2s.spoof_id = ol.spoof_id group by section,pos order by scount;
oas_log is the 300+ million row table, spoof_2_section is the 1.1 million row table. The oas_log table contains the spoof (targeting string) and the pos (ad position) for each ad impression. Each spoof can belong to one or more sections, and we need to roll up the results to the section, pos level.
Is it possible to see bh.err log file for this query? The size of temporary files looks suspicious, indeed.
BTW, count(*) should be faster than count(column), and is nearly the same for this case.
Is it possible to see bh.err log file for this query? The size of temporary files looks suspicious, indeed.
BTW, count(*) should be faster than count(column), and is nearly the same for this case.
Regards,
Jakub, there was nothing in the bh.err file relevant to the query in question. And thanks for the tip on count(*), wasn’t sure how ICE would react to that. We have re-installed 3.1, and I will be running the same query against that once I have loaded the data. I will post my results here once I have them.
Jakub, there was nothing in the bh.err file relevant to the query in question.
Do you mean, no execution log for this query? Maybe you have not switched it on (“ControlMessages = 2” in brighthouse.ini file)? The log may be helpful, as e.g. all intermediate statistics are displayed there. We will see which part of query produces so large cache files.
OK, I have some more information to share. First, given the known group by issues of 3.2,, we rolled back to the previous version of ICE just in case 3.2 was causing the issues we have seen. Also, we set ‘ControlMessages = 2’ to generate query plan data. After re-loading the test data, I ran another series of test queries, with progressively larger data sets used. Yesterday we again saw the problem where the cache data directory was filled up with over 200Gig of temp files created during query execution. There appears to be a ‘tipping point’ in the data set size that triggers this massive creation of temp files.
First some more information on the data sets involved. I realise we are dealing with very large data sets, and may be pushing the limits of what ICE can do. But this is actual data, not test data, and the query involved is what is required by our business users. There are two tables involved, described below:
First there is a mapping table, that relates the ad target string (spoof) to the available ad targeting areas (section). I have created a ‘bucket’ column, to allow the queries to be split into smaller logical units without compromising the results, This is the section_segment column. Each section was assigned a value between 1 and 4, as we are running on a 4 core machine. The table also contains an integer spoof_id column so that we did not have to join on a varchar to the data table. The table is populated with 1129278 rows, with 1273 distinct values for section.
Here is the actual log data table. It currently contains just over 1 billion rows. The serve_datetime column is used to limit the values based on the date(s) we are trying to analyze.
select section,pos,age_kw,gender_kw,count(*) scount from spoof_2_section s2s join oas_log ol on s2s.spoof_id = ol.spoof_id where ol.serve_datetime between ‘2009-08-24 00:00:00’ and ‘2009-08-26 23:59:59’ and s2s.section_segment = 3 group by section, pos order by scount;
There are just over 800 million rows in the oas_log table for the date range specified above. The same query runs fine when the date range is limited to ‘2009-08-24 00:00:00’ and ‘2009-08-25 23:59:59’, which results in 500 million rows in the OAS log table.
I ran 4 copies of the above query simultaneously, with the section_segment limited to 1,2,3 and 4. That allows all 4 cores on the db box to be utilized. One of the queries ran to completion. The other 3 exited when the cache disk was consumed.
Here is the relevant output found in the br.err file:
2009-09-01 02:42:54 [31] Error: I/O operation on /var/lib/mysql/etc/cache/PS_CB_0000185E10A20x158cf930.bh_tmp failed with error - No space left on device File name /var/lib/mysql/etc/cache/PS_CB_0000185E10A20x158cf930.bh_tmp
2009-09-01 02:43:05 [35] Error: I/O operation on /var/lib/mysql/etc/cache/PS_CB_0000181F0x2aaab0223f00.bh_tmp failed with error - No space left on device File name /var/lib/mysql/etc/cache/PS_CB_0000181F0x2aaab0223f00.bh_tmp
2009-09-01 02:44:04 [34] Error: I/O operation on /var/lib/mysql/etc/cache/PS_CB_000062640x2aaaada87e20.bh_tmp failed with error - No space left on device File name /var/lib/mysql/etc/cache/PS_CB_000062640x2aaaada87e20.bh_tmp
As I said before, the same exact queries ran to completion using only a few Gig of temp space when the time constraint limited the log results to 500 million rows instead of 800 million.
Any help here would be greatly appreciated. I can run any additional queries needed to provide more data.
To find out if running the 4 queries simultaneously was triggering the runaway cache files, I ran one of the offending queries by itself. Sure enough after about 1.5 hours, it started eating all the 200+ Gig available on the cache drive. Here are the bh.err lines for that query: I have let it continue to run, but expect it to terminate shortly when it runs out of space.
To find out if running the 4 queries simultaneously was triggering the runaway cache files, I ran one of the offending queries by itself. Sure enough after about 1.5 hours, it started eating all the 200+ Gig available on the cache drive.