Joinutility seperatorLogin utility separator Infobright.com
   
 
strats with string search
Posted: 21 November 2012 08:46 PM   Ignore ]  
Newbie
Rank
Total Posts:  7
Joined  2012-11-19

We have table with 90 million records we want to search a column with string start with

Example :  SELECT COUNT(1) FROM TT WHERE COMPANYNAME LIKE ‘micro%’

We have sorted the this column and loaded the data but still this query takes lot of time. Please help us.

Thanks
Raja

Profile
 
Posted: 22 November 2012 02:29 AM   Ignore ]   [ # 1 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  811
Joined  2008-08-18

Hi,

What is the column definition for COMPANYNAME (in particular, collation settings)?
Could you, please, provide query execution log for this query? See e.g.: http://www.infobright.org/wiki/Query_Execution_Log/

Regards,

Signature 
Profile
 
Posted: 22 November 2012 10:44 AM   Ignore ]   [ # 2 ]  
Administrator
RankRankRank
Total Posts:  356
Joined  2012-01-18

This column sounds like it would be a good condidate for md5 hashing with a search.  Jakub, does that sound correct?

Signature 

Craig Trombly
Community Manager
60 Hazelwood Dr Champaign,  IL 61820
(JavaScript must be enabled to view this email address)

Profile
 
Posted: 26 November 2012 07:10 AM   Ignore ]   [ # 3 ]  
Newbie
Rank
Total Posts:  7
Joined  2012-11-19

Ok, How do i use MD5 for starts with search (SELECT COUNT(1) FROM TABLE WHERE NAME LIKE ‘GOOG%’ ?

For Exact match we can use MD5 what about above mentioned case ? , This column is sorted and we loaded the data,  this is huge requirement for us in the application , User can search with some text that start with a specific word

Thanks
Raja

Profile
 
Posted: 26 November 2012 08:29 AM   Ignore ]   [ # 4 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  1058
Joined  2008-08-18

Hi !

Without the Query Execution Log it is dificult to answer any query related to the performance. You have not answered what is the column collation, either.

Profile
 
Posted: 26 November 2012 11:05 AM   Ignore ]   [ # 5 ]  
Administrator
RankRankRank
Total Posts:  356
Joined  2012-01-18

Rajas,

We can overcome this issue that you are experiencing with a little bit of help from your side.  As indicated in the

http://www.infobright.org/wiki/Query_Execution_Log/

Please set your ControlMessages = 5

Then stop/restart the service, rerun your query that is taking too long and email us the bh.err file.

This will allow us to break down the performance issue and make a suggestion on how to best overcome this.

Signature 

Craig Trombly
Community Manager
60 Hazelwood Dr Champaign,  IL 61820
(JavaScript must be enabled to view this email address)

Profile
 
Posted: 26 November 2012 11:13 PM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  7
Joined  2012-11-19

Sorry guys , Please find the detailsbelow

1. Character set & Collation ” utf-8 , utf8-genral_ci   ( we need case insensitive column for string search)

2. bh.err file attached , Search string   “%” you will queries like “google%’ , ‘micro%’, “big%’  these are taking time.


Mean time i have tried split this column with 64 varchar and tried same start with search , seems faster read some post in infobright that knowledge grid /char mao will handle 64 varchar column efficiently. Yet to confirm this performance in production with multiple user access.

Waiting for your suggestion.

Thanks
Raja

File Attachments 
bh.zip  (File Size: 146KB - Downloads: 47)
Profile
 
Posted: 27 November 2012 10:54 AM   Ignore ]   [ # 7 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  811
Joined  2008-08-18

Hi,

If these searches are the slowest operations, I would add a column to store first e.g. 10 characters, normalized on load (e.g. lowercase) and defined as latin1_bin. Decompression time (which is the slowest operation here) is unproportionally lower on short strings, and any operations are faster on non-UTF columns. Then a query may be processed to cover both columns:

SELECT ........ WHERE short_col LIKE ‘google%’ AND orig_col LIKE ‘google%’....

The first condition will select suspected rows quickly, and the original column is used for security, to cover wide searches (more than 10 characters) and for easy access to the original UTF-8 data.

Regards,

Signature 
Profile