Joinutility seperatorLogin utility separator Infobright.com
   
 
How to effectively use infobright knowledge grid and optimizer ?
Posted: 10 August 2009 10:26 AM   Ignore ]  
Newbie
Rank
Total Posts:  27
Joined  2008-10-01

Hi all,

I am using Infobright 3.2 ICE as the database and pentaho BI suite to make reports and dashboards .  Ofcource its giving better performance than mysql.

Problem I am facing is that , when I am loading the chart for the second time.its not giving the performance as expected.


Time for the chart for 1st time loading: 42.78 sec

Time for the chart for 2nd time loading: 35.93 sec

As per my knowledge if I hit a query for first time a knowledge node will be created , which will be used for the subsequent execution of the query which will fetch the result set with lesser time than the first time.


So what I am expecting is the second load time for the chart at a lesser time. So can anyone suggest me some way to improve the chart loading performance for the second time and also to make use of infobright knowledge grid and optimizer effective?


Thanks and Regards

Dhanesh

[ Edited: 10 August 2009 10:38 AM by Dhanesh]
Profile
 
Posted: 11 August 2009 09:22 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello Dhanesh,

I was thinking how to reply but I need to admit that I don’t understand the following:

Dhanesh - 10 August 2009 10:26 AM

Problem I am facing is that , when I am loading the chart for the second time.its not giving the performance as expected.

Could you please tell me what you mean by loading the chart?

Many thanks and best greetings,

Dominik

Signature 
Profile
 
Posted: 11 August 2009 10:12 AM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  27
Joined  2008-10-01

Hello Dominik,

I apologize that I didn’t phrase my question properly. By “loading of chart ” I meant the time taken for the report to appear
in the UI.

I will make my queries more clear.

My understanding is as follows,

If I run an SQL query for the first time and subsequent times , From the second time the execution time for the query will be much lesser
than the first time in Infobright. When the same query is hit for the second time it will make use of the knowledge grid and will give the response at a lesser time.  But when I am doing the same I noticed that there is not much difference between the execution times between the first and the second time.  What could be the reason for this?

Could you please brief me if my understanding about the knowledge grid is wrong . What could be the best practices to make use of knowledge grid effectively ?  I have read in user guide that I can create knowledge nodes manually by using a select query joining two tables . Can you briefly tell me about this and where and how the knowledge nodes will be created? My prime goal is to get the queries at a lesser response time from the second time.

Hope my queries are clear this time and hoping for a positive reply,

Thanks and Regards,
Dhanesh.

Profile
 
Posted: 11 August 2009 10:59 AM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello Dhanesh,

Thanks a lot for clarification! I’m sorry that I didn’t get it before.

Unless you use join in your queries (please confirm), all the Knowledge Nodes are being built immediately during load. Hence, they will be used in the same way during the first run and the second run.

The speedup that you observe is most likely the result of keeping some data in memory. Namely, the parts of the data that were most recently used in the queries remain in memory and are faster accessible for the next queries. Therefore, if you repeat the same query without restarting the server in the meantime, you will see a speedup.

As a summary, the speedup is not because of Knowledge Nodes but rather because of data in memory. Certainly, in order to obtain more speedup you may try to increse memory settings, unless they are already set up to maximum for your machine. Further, it also depends on the queries. For the queries involving large portions of data (after application of Knowledge Grid), that cannot fit into memory to a large extent, the speedup may be not so visible.

We may of course look for other possibilities to try to improve the performance of your queries. To do this, we would probably need to learn more about your queries and the data.

Please kindly let me know whether it helps.

Best greetings and thanks again,

Dominik

Signature 
Profile
 
Posted: 11 August 2009 11:40 AM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  27
Joined  2008-10-01

Hi Dominik,

Thank you so much for your reply.

I am using MDX queries which is executed by the mondrian engine. This MDX queries will be internally converted by mondrian engine to SQL
queries and will be hitting the Infobright database. I have observed that SQL queries generated are using complex joins between four to five tables. So is there any need for me to manually create knowledge nodes for this? Also I would like to know where the Knowledge nodes will be created ?.

From the previous reply from you , my understanding is that if the queries are not using joins , I cant expect a gain in execution time for the second time since the knowledge nodes are created at the loading time itself. 

But what about if I am using queries with joins ? As per my understanding the knowledge nodes will be created for this only after the first execution of the query . So for the next execution of the same query ,can I expect a gain in execution time ?

Do I need to create knowledge nodes manually in this case ?

Hoping for a positive reply from you,

Thanks and Regards

Dhanesh

Profile
 
Posted: 11 August 2009 12:07 PM   Ignore ]   [ # 5 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  487
Joined  2008-08-18

Hello Dhanesh,

If you use joins, you might see the speedup. This is because Pack-To-Packs (Knowledge Nodes that are responsible for join execution) are generated not during load but during the first join of the given tables with the given join condition. You do not need to set up anything manually. Everything should be generated automatically, including refreshments of Knowledge Nodes whenever you load and then query some new portions of the data.

However, there is an issue: We did not include the modules generating Pack-To-Packs to 3.2 RC1. We will do it in one of the next versions, but for now you cannot observe how Pack-To-Packs contribute to the performance, unless you’re using 3.1.

Moreover, even in 3.1, Pack-To-Packs are sometimes not generated either. If one of the tables is below 64K rows, which may happen in case of, e.g., star schemas, the system does not create Pack-To-Packs at all. This will remain the same in 3.2.

From this perspective, I would be very glad to know more about the data and the queries. Would you be able to post bh.err and some basic information about particular tables? Then we may analyze whether Pack-to-Packs can help at all. Fortunately, there are also other join optimizations that we implemented. However, they may not be related to observing speedups.

Many thanks and best greetings,

Dominik

Signature 
Profile
 
Posted: 24 August 2009 10:16 AM   Ignore ]   [ # 6 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  505
Joined  2008-08-18

Hi Dhanesh,

We received your bh.err log. Thanks.

Did you notice the reports running more slowly with Infobright 3.2 as opposed to 3.1.1p1?

It would also be helpful if you could enable query logging. This will show the queries generated by Mondrian that are coming in to Infobright. To enable query logging,

1) Run the attached SQL script if necessary. This will create the tables needed for logging
2) Put the following line in the [mysqld] section of your /etc/my-ib.cnf:

log = /path to data directory/query_log.txt

Thanks
Geoffrey

[ Edited: 24 August 2009 11:09 AM by Geoffrey Falk]
File Attachments 
create_log_tables.sql  (File Size: 1KB - Downloads: 384)
Signature 
Profile