Joinutility seperatorLogin utility separator Infobright.com
   
 
Convert vertical data
Posted: 09 November 2011 04:44 PM   Ignore ]  
Newbie
Rank
Total Posts:  4
Joined  2011-11-09

I have a table that store a key=value reference of arbitrary data from my main records, and each record could have N number of key=values.

How can I convert that data to fit on ICE and use the full potential of its optimizer?

[ Edited: 09 November 2011 04:50 PM by ernesto_vargas]
Profile
 
Posted: 09 November 2011 05:13 PM   Ignore ]   [ # 1 ]  
Administrator
RankRankRankRank
Total Posts:  516
Joined  2010-09-22

Hi Ernesto,

Can you illustrate your data a bit?  Does it resemble:

car1=> { color=blue, make=oldsmobile, model=intrigue }
car2=> {make=honda }
car3=> {color=red, model=cavalier, year=1999 }

Also, what will you be doing with this data?

Signature 

jeff kibler

Profile
 
Posted: 09 November 2011 05:36 PM   Ignore ]   [ # 2 ]  
Newbie
Rank
Total Posts:  4
Joined  2011-11-09

Thank you Jeff for your response.

Yes your illustration is very similar to my problem. But I will add that the main table records will be car_id,plate_id, miles, price, horizontal, and then the other table will have data as you illustrate it, vertical data.

I was thinking to add N columns to the main table and store them into ICE.

We could later aggregate and filter by color for example.

Profile
 
Posted: 09 November 2011 07:09 PM   Ignore ]   [ # 3 ]  
Administrator
RankRankRankRank
Total Posts:  516
Joined  2010-09-22

To further affirm my thoughts on your scenario, you basically have M columns that always contain data (ie: not null).  You then have N more columns that may or may not contain data (ie: what you call ‘vertical data’).  Is this true?

Are the number of potential columns finite?  Or could they grow and grow over time?

The data you’re illustrating is semi-structured.  Will you be performing analytics on the dataset, or will you be simply referencing the data?  Can you give me a rough idea as to the types of queries (and the expectation of speed) you’d have with your DB?

Thanks,

Jeff

Signature 

jeff kibler

Profile
 
Posted: 09 November 2011 07:20 PM   Ignore ]   [ # 4 ]  
Newbie
Rank
Total Posts:  4
Joined  2011-11-09

1. Correct

2. No more than 30 columns

3. Here is one of the most complex queries:

SELECT * FROM (SELECT “.$firstcolumn.”,backend.channel AS Channel,SUM(backend.lead) AS Lead,0 AS Browser,SUM(backend.shopper) AS Shopper,SUM(backend.conversion) AS Conversions,SUM(backend.alt_conversion) AS Alt_Conversions,{$revenue_column} AS Revenue,{$alt_revenue_column} AS Alt_Revenue,0 AS Cost,0 As ROAS, ‘’ AS PV_Account, diavv.value AS ab_testing, diavv.tracker_variable_id AS tvi, GROUP_CONCAT(DISTINCT diavv.value ORDER BY diavv.variableid) AS variables
                      FROM tmp_backend backend, data_intermediate_ad_variable_value diavv
                      WHERE backend.connection_id=diavv.data_intermediate_ad_connection_id
                      GROUP BY {$groupby}
                      ORDER BY channel, revenue DESC) t WHERE tvi = (SELECT variableid FROM campaigns_trackers ct, variables v WHERE ct.trackerid=v.trackerid AND ct.campaignid = {$campaignid}  AND ct.status = 4 AND v.name = ‘_sr_ab_testing’);

Profile
 
Posted: 10 November 2011 06:32 AM   Ignore ]   [ # 5 ]  
Super Duper Member
Avatar
RankRankRankRankRank
Total Posts:  811
Joined  2008-08-18

Hi,

Some remarks about the example query:

1. GROUP_CONCAT is not supported (i.e. will be very slow in ICE).

2. The outer “SELECT *” looks as an unnecessary complication. The WHERE condition on the end may be applied to the inside query instead.

Apart of that, creating a table with a set of properties may be a good idea. Something like (object_id, property_name, property_value). In this case I would suggest keeping that table ordered (or at least partially ordered) by object_id, and making property_name an integer id, or ‘lookup’ varchar column.

Regards,

Signature 
Profile
 
Posted: 10 November 2011 10:27 AM   Ignore ]   [ # 6 ]  
Newbie
Rank
Total Posts:  4
Joined  2011-11-09

Jakub thank you for taking the time to review the query.

1. Indeed the GROUP_CONCAT is what I am trying to avoid creating N columns with the main table schema, which will allow to aggregate on those values also.

2. That subquery can be eliminated fro sure.

Profile