Joinutility seperatorLogin utility separator Infobright.com
   
 
Infobright Taking more time to return data
Posted: 20 November 2012 03:42 AM   Ignore ]  
Newbie
Rank
Total Posts:  3
Joined  2012-11-20

Hi All,

I have just created a table with following fields

CREATE TABLE `WORLDBASE` (
  `INTELLECTID` int(20) DEFAULT NULL,
  `DUNSNO` int(9) DEFAULT NULL,
  `YEAR1EMPLOYEEGROWTH` int(10) DEFAULT NULL,
  `YEAR1SLSGROWTH` int(20) DEFAULT NULL,
  `ACCOUNTINGFIRM` varchar(50) DEFAULT NULL,
  `PHYSICALADDRESS` varchar(150) DEFAULT NULL,
  `PHYSICALCITY` varchar(64) DEFAULT NULL,
  `PHYSICALCITYCODE` varchar(10) DEFAULT NULL,
  `PHYSICALSTATE` varchar(50) DEFAULT NULL,
  `PHYSICALSTATEPROVINCECODE` varchar(20) DEFAULT NULL,
  `PHYSICALCOUNTY` varchar(50) DEFAULT NULL,
  `PHYSICALZIPCODE` varchar(16) DEFAULT NULL,
  `PHYSICALCOUNTRY` varchar(100) DEFAULT NULL,
  `CARRIERROUTE` varchar(4) DEFAULT NULL,
  `MAILINGADDRESS` varchar(150) DEFAULT NULL,
  `MAILINGCITY` varchar(64) DEFAULT NULL,
  `MAILINGCITYCODE` varchar(10) DEFAULT NULL,
  `MAILINGSTATE` varchar(50) DEFAULT NULL,
  `MAILINGSTATEPROVINCECODE` varchar(20) DEFAULT NULL,
  `MAILINGCOUNTY` varchar(50) DEFAULT NULL,
  `MAILINGZIPCODE` varchar(16) DEFAULT NULL,
  `MAILINGCOUNTRY` varchar(100) DEFAULT NULL,
  `COMPANYNAME` varchar(120) DEFAULT NULL,
  `COMPANYTYPE` varchar(10) DEFAULT NULL,
  `COPIERACQUISITIONCHANNEL` varchar(1) DEFAULT NULL,
  `CONTROLSTARTYEAR` varchar(4) DEFAULT NULL,
  `CURRENCY` varchar(4) DEFAULT NULL,
  `DOMESTICSUBSIDIARIES` varchar(1) DEFAULT NULL,
  `FILLER` varchar(8) DEFAULT NULL,
  `DOMESTICULTPARENTDUNS` varchar(9) DEFAULT NULL,
  `DOMESTICULTPARENTNAME` varchar(120) DEFAULT NULL,
  `EMPLOYEESALLSITES` int(10) DEFAULT NULL,
  `EMPLOYEESTHISSITE` int(10) DEFAULT NULL,
  `EMPLOYEESHEREACCURACY` varchar(1) DEFAULT NULL,
  `EMPLOYEESHEREACCURACYFULL` varchar(2000) DEFAULT NULL,
  `EMPLOYEESTOTACCURACY` varchar(1) DEFAULT NULL,
  `EMPLOYEESTOTACCURACYFULL` varchar(2000) DEFAULT NULL,
  `EXPORTERINDICATOR` varchar(1) DEFAULT NULL,
  `EXPORTERINDICATORFULL` varchar(2000) DEFAULT NULL,
  `FACILITYSIZE` int(10) DEFAULT NULL,
  `FAMILYTREEDIASCODE` varchar(9) DEFAULT NULL,
  `FAMILYTREEHIERARCHYCODE` varchar(2) DEFAULT NULL,
  `FAMILYTREEMEMBERCOUNT` int(10) DEFAULT NULL,
  `FAMILYTREERELATIONTYPE` varchar(1) DEFAULT NULL,
  `FAXNO` varchar(25) DEFAULT NULL,
  `FEDERALEMPLOYERIDNO` varchar(10) DEFAULT NULL,
  `FIPSSTATECODE` varchar(4) DEFAULT NULL,
  `FIPSSTATENAME` varchar(100) DEFAULT NULL,
  `FORTUNE1000POSITION` int(10) DEFAULT NULL,
  `FRANCHISENAME` varchar(120) DEFAULT NULL,
  `FRANCHISEOPERATIONTYPE` varchar(1) DEFAULT NULL,
  `GLOBALULTDUNS` varchar(9) DEFAULT NULL,
  `GLOBALULTNAME` varchar(120) DEFAULT NULL,
  `HIGHESTRNKDOMESTICPARENT` varchar(120) DEFAULT NULL,
  `HOMEBASEDBUSINESSINDICATOR` varchar(1) DEFAULT NULL,
  `IMMEDIATEPARENTDUNS` varchar(9) DEFAULT NULL,
  `IMMEDIATEPARENTNAME` varchar(120) DEFAULT NULL,
  `ITEXPENDIUREPERCENTILE` varchar(1) DEFAULT NULL,
  `LATLONGACCURACY` varchar(1) DEFAULT NULL,
  `LATLONGACCURACYFULL` varchar(2000) DEFAULT NULL,
  `LATITUDE` varchar(12) DEFAULT NULL,
  `LEGALSTATUSCODE` varchar(3) DEFAULT NULL,
  `LEGALSTATUSDESC` varchar(100) DEFAULT NULL,
  `LINEOFBUSINESS` varchar(128) DEFAULT NULL,
  `LONGITUDE` varchar(12) DEFAULT NULL,
  `MANUFACTURERINDICATOR` varchar(1) DEFAULT NULL,
  `MANUFACTURERINDICATORFULL` varchar(2000) DEFAULT NULL,
  `MARKETINGPRESCREENSCORE` varchar(1) DEFAULT NULL,
  `MARKETINGPRESCREENSCOREFULL` varchar(2000) DEFAULT NULL,
  `MINORITYOWNEDINDICATOR` varchar(1) DEFAULT NULL,
  `MINORITYOWNEDINDICATORFULL` varchar(2000) DEFAULT NULL,
  `NAICCODE` varchar(6) DEFAULT NULL,
  `NAICDESC` varchar(150) DEFAULT NULL,
  `NOOFCOPIERSRANGE` varchar(1) DEFAULT NULL,
  `NOOFNODESRANGE` varchar(1) DEFAULT NULL,
  `NOOFPCS` varchar(1) DEFAULT NULL,
  `NOOFPRINTERSRANGE` varchar(1) DEFAULT NULL,
  `OFFICERFULLNAME` varchar(100) DEFAULT NULL,
  `OFFICERSUFFIX` varchar(20) DEFAULT NULL,
  `OWNSRENTSINDICATOR` varchar(1) DEFAULT NULL,
  `OWNSRENTSINDICATORFULL` varchar(2000) DEFAULT NULL,
  `PHONEAREACODE` varchar(9) DEFAULT NULL,
  `PHONENO` varchar(25) DEFAULT NULL,
  `POSTALDELIVERYPOINT` varchar(4) DEFAULT NULL,
  `PRINTVOLUMERANGE` varchar(1) DEFAULT NULL,
  `SLS` int(20) DEFAULT NULL,
  `SLSACCURACY` varchar(1) DEFAULT NULL,
  `SLSACCURACYFULL` varchar(2000) DEFAULT NULL,
  `SICCODE` varchar(8) DEFAULT NULL,
  `SICDESC` varchar(150) DEFAULT NULL,
  `SMALLBUSINESSINDICATOR` varchar(1) DEFAULT NULL,
  `SMALLBUSINESSINDICATORFULL` varchar(2000) DEFAULT NULL,
  `STATEOFINCORPORATION` varchar(20) DEFAULT NULL,
  `STATEPROVINCE` varchar(20) DEFAULT NULL,
  `STOCKEXCHANGE` varchar(50) DEFAULT NULL,
  `STOCKSYMBOL` varchar(12) DEFAULT NULL,
  `SUBSIDIARYSTATUS` varchar(1) DEFAULT NULL,
  `SUBSIDIARYSTATUSFULL` varchar(2000) DEFAULT NULL,
  `TRADESTYLE` varchar(120) DEFAULT NULL,
  `URL` varchar(255) DEFAULT NULL,
  `URLTYPE` varchar(1) DEFAULT NULL,
  `USMETROPOLITANAREA` varchar(4) DEFAULT NULL,
  `MSANAME` varchar(100) DEFAULT NULL,
  `VETERANOWNEDINDICATOR` varchar(1) DEFAULT NULL,
  `VETERANOWNEDINDICATORFULL` varchar(2000) DEFAULT NULL,
  `WOMANOWNEDINDICATOR` varchar(1) DEFAULT NULL,
  `WOMANOWNEDINDICATORFULL` varchar(2000) DEFAULT NULL,
  `YEAROFFOUNDING` varchar(4) DEFAULT NULL,
  `ZIPCODE` varchar(16) DEFAULT NULL,
  `LOCATIONTYPE` varchar(1) DEFAULT NULL,
  `LOCATIONTYPEFULL` varchar(2000) DEFAULT NULL,
  `COMPANYTYPEFULL` varchar(2000) DEFAULT NULL,
  `COPIERACQUISITIONCHANNELFULL` varchar(2000) DEFAULT NULL,
  `CURRENCYISOCODE` varchar(3) DEFAULT NULL,
  `FRANCHISEOPERATIONTYPEFULL` varchar(2000) DEFAULT NULL,
  `STATENAME` varchar(100) DEFAULT NULL,
  `URLTYPEFULL` varchar(2000) DEFAULT NULL,
  `TXTDUNSNO` varchar(36) DEFAULT NULL
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=latin1;

Whereas it contains 80Million records, when i use the following query

SELECT COMPANYNAME FROM WORLDBASE WHERE COMPANYNAME LIKE ‘MICRO%’

it is taking more time to get the result. I have checked in both community edition and trial enterprise edition.

Please let me know is there any wrong in database design and way am querying.

Thanks

Profile
 
Posted: 20 November 2012 05:26 AM   Ignore ]   [ # 1 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  1065
Joined  2008-08-18

Hi !

You say

it is taking more time to get the result

You mean more than what ?

Please try recreating the table and reloading it with

ENGINE=BRIGHTHOUSE DEFAULT CHARSET=latin1 COLLATE=latin1_bin 

The binary collation is faster and the Knowledge grid can be used for binary collation. However, comparisons using long strings are slow anyway. The IEE can be faster a few times for your query as it can decompress the string column in parallel - decompression is the main cost here.

If the company name is the main selection criteria, it will help if you sort the data on company name before loading into IB. If you load the data in portions e.g. 1 million rows per a load file, then each file can be sorted separately - it will help anyway.

Profile
 
Posted: 20 November 2012 09:21 AM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  3
Joined  2012-11-20

Hi !,

Thanks for your response…

I have used trial enterprise edition to test the same scenario, but no luck it is also taking more time. Whereas if i use integer then performance is good.

For e.g. SELECT COMPANYNAME FROM WORLDBASE WHERE SLS > 5000000


Thanks

Profile
 
Posted: 20 November 2012 01:10 PM   Ignore ]   [ # 3 ]  
Administrator
RankRankRank
Total Posts:  359
Joined  2012-01-18

Alot of it depends upon the sorting order of the data as it goes into Infobright, as to decide which column use.  For instance, when you use a numerical (INT) and the data is in numerical order of that column, then it will be very quick becuase Infbright would be able to use the Knowledge Nodes to determine which data packs NOT to look at in resolving the query.  So if you think about how you data is sorted (collation order), think about what columns would identify which data packs to look at quickly and which one’s to not look at.

Signature 

Craig Trombly
Community Manager
60 Hazelwood Dr Champaign,  IL 61820
(JavaScript must be enabled to view this email address)

Profile
 
Posted: 21 November 2012 06:17 AM   Ignore ]   [ # 4 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  1065
Joined  2008-08-18

Hi !

To make the things clear:
- IEE can be several times faster than ICE because IEE uses multithreading and can process a WHERE condition in parallel (depends on the number of available CPU cores)
- a condition on numbers is usually much faster to execute than a condition on strings
- a condition on strings with collation *_bin (e.g. latin1_bin) is usually a few times faster than a condition on strings with non-binary collation.

Profile