Joinutility seperatorLogin utility separator Infobright.com
   
 
SELECT with ORDER BY and LIMIT are very slow.
Posted: 08 February 2010 08:51 PM   Ignore ]  
Newbie
Rank
Total Posts:  5
Joined  2010-02-08

I have created 100M rows table (292 columns) and run several queries:
mysql> select C2, C10 from T where DAY = ‘2009-12-02’ order by C10 DESC limit 30;

output skipped
+—————————————————————————————————+—————————+
30 rows in set (5 min 1.15 sec)
mysql> select C2, C9 from T where DAY = ‘2009-12-02’ order by C9 DESC limit 30;

output skipped
+—————————————————————————————————+——————+
30 rows in set (0.52 sec)
mysql> SELECT C2, C12 FROM T
  ->  WHERE (DAY>=‘2009-11-27’) AND (DAY<=‘2009-12-10’)
  ->  AND C3 = ‘YYYY’
  ->  AND C4 = ‘04.26.08’
  ->  AND C7 = ‘XXX ’ AND
  ->  C8=‘310-410-55502’
  ->  AND ((C11 > 0 AND C11 IS NOT NULL))
  ->  ORDER BY C12 DESC LIMIT 30;
output skipped
+—————————————————————————————————+—————————-+
30 rows in set (14 min 3.55 sec)
The first query is on column of type DOUBLE with cardinality ~ 10000
Second query is on INTEGER with low cardinality (512) - and its fast.
The last query is one of our Top N queries. Iostat shows max 2MB/s disk read speed during any query execution for mysql and ‘top’ gives 100% CPU for mysql - this is unusual I would say. I hope that this is configuration (or misconfiguration ) issue.  I do observe some very light kswapd0 activity - do not think this might be the reason of a poor query performance. I understand that it is not easy to troubleshoot query performance remotely but may be someone have idea what should I look into?
Hardware:
8 CPU Xeon 54xx; RAM 16GB; 4 SATA 500GB drives (7200) 
Software : Linux version 2.6.18-53.el5 ((JavaScript must be enabled to view this email address)) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Wed Oct 10 16:34:19 EDT 2007
I have not touched any default IB configuration settings. Disks are fast enough - 75MB/s sequential read .
Below are table DDL
CREATE TABLE T (
  `C1` varchar(65) DEFAULT NULL,
  `C2` varchar(80) DEFAULT NULL, - - this is very large column ~ 3GB compressed
  `C3` varchar(256) DEFAULT NULL,
  `C4` varchar(128) DEFAULT NULL,
  `C5` int(11) DEFAULT NULL,
  `C6` varchar(8) DEFAULT NULL,
  `C7` varchar(128) DEFAULT NULL,
  `C8` varchar(16) DEFAULT NULL,
  `C9` int(11) DEFAULT NULL,
  `C10` double DEFAULT NULL,
  `C11` tinyint(4) DEFAULT NULL,
  `C12` tinyint(4) DEFAULT NULL,
  `DAY` date DEFAULT NULL,
+ 279 more
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=latin1
-Vlad

Profile
 
Posted: 09 February 2010 07:42 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  606
Joined  2008-08-18

Hi Vlad,

I have two suggestions:

1. Please try higher IB settings (brighthouse.ini file), especially main heap size - it should be about 10000-12000 for your hardware configuration, plus 2000 for compressed heap size.

2. Please run the slow query with logs enabled (they may help in performance troubleshooting) - please refer to:
http://www.infobright.org/wiki/Query_Execution_Log/

It would be helpful if you could post here the log for the slowest query.

Regards,

Signature 
Profile
 
Posted: 09 February 2010 02:26 PM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  5
Joined  2010-02-08

OK, I have enabled logging.

This is query:

SELECT IDSPEECHCALLDROPPED FROM IBFACT
 WHERE 
(DAY>='2009-11-27') AND (DAY<='2009-12-10'
     AND 
DEVICEMAKEMODEL 'Apple N88' 
     
AND DEVICEFIRMWAREREV '04.26.08' 
     
AND OPERATOR 'AT&T (US)' AND 
     
STARTING_LAI='310-410-55502' 
     
AND ((SPEECHCALLESTABLISHED AND SPEECHCALLESTABLISHED IS NOT NULL))  
     
ORDER BY SPEECHCALLDROPPED DESC LIMIT 30 

This is query - related part of bh.err (see bh.err attached)

2010-02-09 18:54:23 [1] T:-1 = TABLE_ALIAS(T:0,“IBFACT”)
T:-2 = TMP_TABLE(T:-1)
VC:-2.0 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:1))
A:-1 = T:-2.ADD_COLUMN(VC:-2.0,LIST,“ID”,“ALL”)
VC:-2.1 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:79))
A:-2 = T:-2.ADD_COLUMN(VC:-2.1,LIST,“SPEECHCALLDROPPED”,“ALL”)
T:-2.ADD_ORDER(VC:-2.1,DESC)
VC:-2.2 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:262))
VC:-2.3 = CREATE_VC(T:-2,EXPR(“2009-11-27”))
F:0 = CREATE_FILTER(T:-2,VC:-2.2,>=,VC:-2.3,<null>,WHERE)
VC:-2.4 = CREATE_VC(T:-2,EXPR(“2009-12-10”))
F:0 = AND(F:0,VC:-2.2,<=,VC:-2.4,<null>)
VC:-2.5 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:2))
VC:-2.6 = CREATE_VC(T:-2,EXPR(“Apple N88”))
F:0 = AND(F:0,VC:-2.5,=,VC:-2.6,<null>)
VC:-2.7 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:3))
VC:-2.8 = CREATE_VC(T:-2,EXPR(“04.26.08”))
F:0 = AND(F:0,VC:-2.7,=,VC:-2.8,<null>)
VC:-2.9 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:6))
VC:-2.10 = CREATE_VC(T:-2,EXPR(“AT&T (US)”))
F:0 = AND(F:0,VC:-2.9,=,VC:-2.10,<null>)
VC:-2.11 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:7))
VC:-2.12 = CREATE_VC(T:-2,EXPR(“310-410-55502”))
F:0 = AND(F:0,VC:-2.11,=,VC:-2.12,<null>)
VC:-2.13 = CREATE_VC(T:-2,PHYS_COL(T:-1,A:62))
VC:-2.14 = CREATE_VC(T:-2,EXPR(“0”))
F:0 = AND(F:0,VC:-2.13,>,VC:-2.14,<null>)
F:0 = AND(F:0,VC:-2.13,IS NOT NULL,<null>,<null>)
T:-2.FILTER(F:0)
T:-2.MODE(LIMIT,0,30)
RESULT(T:-2)

2010-02-09 18:54:23 Type conversion for VC:3
2010-02-09 18:54:23 Type conversion for VC:4
2010-02-09 18:54:23 [1] Merging conditions…
2010-02-09 18:54:23 [1] Initial execution plan (non-join):
2010-02-09 18:54:23 [1] Cnd(0):  VC:11(t0a7) BET. VC:12(“310-410-55502”) AND VC:12(“310-410-55502”)  (17.41)
2010-02-09 18:54:23 [1] Cnd(1):  VC:9(t0a6) BET. VC:10(“AT&T (US)”) AND VC:10(“AT&T (US)”)    (17.77)
2010-02-09 18:54:23 [1] Cnd(2):  VC:5(t0a2) BET. VC:6(“Apple N88”) AND VC:6(“Apple N88”)    (17.77)
2010-02-09 18:54:23 [1] Cnd(3):  VC:7(t0a3) BET. VC:8(“04.26.08”) AND VC:8(“04.26.08”)      (17.77)
2010-02-09 18:54:23 [1] Cnd(4):  VC:13(t0a62) BET. VC:21(1) AND VC:22(+inf)  (17.77)
2010-02-09 18:54:23 [1] Cnd(5):  VC:2(t0a262) BET. VC:17 AND VC:20   (18.46)
2010-02-09 18:54:23 [1] Packs/packrows after KN evaluation:
2010-02-09 18:54:23 [1] (t0) Pckrows: 1601, susp. 1601 (0 empty 0 full). Packs opened in 6 cond.: 8005
2010-02-09 18:55:53 [1] Packrows after exact evaluation (WHERE):
2010-02-09 18:55:53 [1] (t0): 1601 all packrows, 1313 to open (including 0 full)
2010-02-09 18:55:53 [1] Sorter initialized for 9798 rows, 1+66 bytes each.
2010-02-09 19:08:27 [1] Sorted rows retrieved.
2010-02-09 19:08:27 [1] Displaying result: 30 rows.
2010-02-09 19:08:27 [1] Total data packs actually loaded (approx.): 8807
2010-02-09 19:08:27 [1]——————————————————————————————————————

Execution time is > 14min. I see nothing suspicious but I am not IB guru.

-Vladimir

File Attachments 
bh.txt  (File Size: 7KB - Downloads: 117)
Profile
 
Posted: 09 February 2010 05:36 PM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  606
Joined  2008-08-18

Thanks for the log. Looks like the only reason of slow performance is sorting - namely, decompressing ID column in order to find 30 top values. However, we actually need only SPEECHCALLDROPPED column to be completely decompressed for sorter. Optimization of sorting (in particular, for queries like this) is already planned and should be done in one of the next releases.

Regards,

Signature 
Profile
 
Posted: 09 February 2010 05:54 PM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  5
Joined  2010-02-08

Thanks, that was my initial guess as well. Too large and fat ID column in a query. The same query without ID column runs much faster - sometimes less than 20 secs. But loading/decompressing 2.8GB column into memory should not take 14 minutes, imho, of course.

-Vladimir

UPD: Does IB really sort the whole column? We need only top 30.

[ Edited: 09 February 2010 06:02 PM by Vladimir]
Profile
 
Posted: 10 February 2010 03:03 AM   Ignore ]   [ # 5 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  606
Joined  2008-08-18
Vladimir - 09 February 2010 05:54 PM

UPD: Does IB really sort the whole column? We need only top 30.

We must at least read (once) the whole column to find the top 30.

About decompression speed: it is quite slow especially on random and long text values. On the other hand, it is not slower (or not much) comparing to e.g. RAR.

Regards,

Signature 
Profile
 
Posted: 10 February 2010 02:21 PM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  5
Joined  2010-02-08

Jakub,

These are my schema changes:

1. Changed varchar2 ID to int ID (my application autoincrements this ID during fact loading stage)
2. Changed all varchar2 columns in WHERE clause to int types ( a dictionary is created for every such column)

Results:
  1. Compression ration was increased significantly for all modified columns
  2. The query execution time was reduced from 14+minutes to 5-15 sec .

So, basically, the advise is simple: avoid varchar2 in your tables as much as possible.

Thanks,

Vladimir

Profile
 
Posted: 23 February 2010 09:03 AM   Ignore ]   [ # 7 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  606
Joined  2008-08-18

Hi Vladimir,

Avoiding varchar when numerical types are sufficient is generally a good idea. In some cases it may be done automatically in Infobright, by declaring a char/varchar column as LOOKUP:
http://www.infobright.org/wiki/How_and_When_to_use_Lookups/

Lookups are efficient and easy to use (no external dictionary needed), although they may work well only if few distinct values are present (up to, say, 1000 values).

For wide varchar columns with many values I have good news: the performance problem with LIMIT is already fixed and probably will be released in the next version of ICE.

Regards,

Signature 
Profile
 
Posted: 23 February 2010 12:45 PM   Ignore ]   [ # 8 ]  
Newbie
Rank
Total Posts:  5
Joined  2010-02-08

Great news, Jakub. Thanks.

Profile