I am trying to execute a query against a fact table of approximately 13 million rows. The query joins in one of its dimensions and a third table that will be used to map that dimension from a slowly changing type 2 to a type 1 dimension. Additionally, I am removing a degenerate dimension.
I know it sounds complicated, but the query is pretty straight forward. I originally was selecting into an outfile and let the query run for 12 hours before killing it. I then attempted to select the first 100 rows using limit 100, and it ran for an hour before I killed it.
All fields in the fact table are integers, except the degenerate dimension that is being removed. The fact table has roughly 13 million rows, the dimension has 250,000 rows, and the mapping table has 3200 rows. Any recommendations on how to make this query execute would be appreciated.
SELECT `date_dimension_id`
, `specialty_dimension_id`
, `geo_dimension_id`
, `max_ad_dimension_id`
, `customer_dimension_id`
, `page_dimension_id`
, `domain_dimension_id`
, `channel_dimension_id`
, sum(`impression_count`)
, sum(`click_count`)
FROM `ad_performance`
JOIN `ad_dimension` on `ad_dimension`.`id` = `ad_performance`.`ad_dimension_id`
JOIN `ad_dim_map` on `ad_dim_map`.`ad_id` = `ad_dimension`.`ad_id`
GROUP BY `date_dimension_id`
, `specialty_dimension_id`
, `geo_dimension_id`
, `ad_dimension_id`
, `customer_dimension_id`
, `page_dimension_id`
, `domain_dimension_id`
, `channel_dimension_id`
ORDER BY `date_dimension_id`
, `specialty_dimension_id`
, `geo_dimension_id`
, `ad_dimension_id`
, `customer_dimension_id`
, `page_dimension_id`
, `domain_dimension_id`
, `channel_dimension_id`;

