Joinutility seperatorLogin utility separator Infobright.com
   
 
How to avoid String function ‘UPPER’ for varchar field?
Posted: 30 April 2010 02:57 AM   Ignore ]  
Newbie
Rank
Total Posts:  17
Joined  2010-04-19

Hi,

I am trying to optimize month column in my fact table.

If month column is integer type then following query is generated by jasperserver which takes 8 sec and 313 milliseconds. This is fine.

select `fd`.`date_month` as `c0`
from `fact_details` as `fd`
where (`fd`.`date_year` = 2010)
and `fd`.`date_month` = 1
group by `c0`
order by `fd`.`date_month` ASC;

But i want to display String value for month field like ‘Jan’,‘Feb’ etc. Therefore i added one more column month_name varchar field in my fact table. Now jasperserver generates following query which takes 1 minute and 59 seconds.

select `fd`.`month_name` as `c0`
from `fact_details` as `fd`
where (`fd`.`date_year` = 2010)
and UPPER(`fd`.`month_name`) = UPPER(‘Jan’)
group by `c0`
order by `fd`.`month_name` ASC;

Now if i compare both the query then the difference is marked in red. This is due to UPPER function in the query.
Now if i run the second query without UPPER function as show below it takes 8 seconds and 547 milliseconds which is equivalent to first query. It means UPPER function is taking maximum time.

select `fd`.`month_name` as `c0`
from `fact_details` as `fd`
where (`fd`.`date_year` = 2010)
and (`fd`.`month_name`) = (‘Jan’)
group by `c0`
order by `fd`.`month_name` ASC;

Please suggest me how should i optimize this.
1. Can i avoid UPPER function so that jasperserver does not generate it for varchar fields?
2. Can i optimize UPPER function in infobright?
3. Is there any alternate way to handle the same scenario?

Profile
 
Posted: 30 April 2010 04:53 AM   Ignore ]   [ # 1 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  738
Joined  2008-08-18

arunsinghal01,

Adding the ‘date_month’ as an integer field to the fact table is sufficient to allow rough set optimization on the largest table.  I would suggest, as in my previous post and in a similar blog post, that you move the textual field ‘month_name’ to the date dimension and simply select it from that table.  Placing it in, and selecting it from, that table removes the burden of executing the UPPER function on every fact record. 

You could also save two fields, ‘month_name_lower, and ‘month_name_upper’ as ‘Jan’ and JAN’, respectively, to remove the UPPER processing altogether.  This is a common practice in date dimensions.

Signature 
Profile
 
Posted: 30 April 2010 05:50 AM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  17
Joined  2010-04-19

Hi David,

Thanks for your quick reply. Suppose i am using a date dimension dim_period which has a key field date_id and in my fact table i have a date key say request_date to map both the tables. Now if i write my OLAP schema for date dimension then it would be like

<Hierarchy name="period_h" hasAll="true" allMemberName="all_periods" allMemberCaption="All Periods" primaryKey="date_id">
    <
Table name="dim_period" alias="dp">
    </
Table>
    <
Level name="Year" column="year_name" type="Integer" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never">
    </
Level>
    <
Level name="Month" column="month" nameColumn="month_name" type="Integer" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
    </
Level>
    <
Level name="Day" column="day" nameColumn="day_name" type="Integer" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
    </
Level>
</
Hierarchy

But in this case i can not use date_month integer column in my fact table. It means jasperserver will generate a join with this table and month filter will not be use on my fact table.

My question is how can i use date_month integer column in fact table as filter on my fact table in conjunction with date dimension table? How it can be defined in OLAP?

Profile
 
Posted: 30 April 2010 10:30 AM   Ignore ]   [ # 3 ]  
Sr. Member
Avatar
RankRankRankRank
Total Posts:  738
Joined  2008-08-18

arunsinghal01,

Thanks for testing my suggestions.  I have to ‘punt’ a little bit here and I will explain why and how.

Dimensional data modeling - star and snowflake schemas - was developed in the 1990s for row-oriented databases using set-based algebraic optimizers.  For these technologies, moving all filters to the small dimensional tables and using those intermediate result sets, effectively, as additional constraints to the large fact table on integer surrogate keys was the highest performance method of analyzing large amounts of data.  (Certain performance problems arose with very large dimensions like ‘product’ and ‘customer’, if that were the case for that environment, or with the choice of non-integer join columns.)

And for that reason all of the ROLAP tools (OLAP engines running against relational tables) built their products to offer constraints from the dimension tables and generate SQL to place constraints against only those tables - the RDBMS would take care of constraining the fact records.

In Infobright, the fact table must be constrained by explicitly placing filters on that table.  Most OLAP tools don’t do that naturally but can be made to do it when necessary (think of constraining on a degenerate dimension column).

For that assistance, I can only suggest you reach out to Jaspersoft to get guidance on how to get the query generator to recognize the ‘date_month’ integer column on the fact table and place a constraint against it.

Signature 
Profile