Saturday, October 19, 2013

select from multiple tables and sum vs join and sum

I am doing internship in an advertising company, i already implemented a tool to gather all the necessary data form facebook & import them on a database.

Now i am trying to manipulate that data, first by making some test cases & getting some results. The tables grow by 35k rows per day so after a month of using the tool i noticed that the query i use to obtain the sum of certain adcreatives clicks is starting to slow down.

i am asking if the query i use can speed up if i use it with a join & how.

here is the query i have for the sum of clicks per adcreative (with adgroup_id,campaign_id as connect to the other tables):

<!-- language-all: lang-sql -->SELECT t1.adgroup_id, t1.campaign_id, t1.creative_ids, SUM( t2.clicks ) AS clicksFROM adgroups t1, adgroup_stats t2WHERE t1.adgroup_id = t2.adgroup_idGROUP BY t1.creative_idsORDER BY clicks DESC 

currently the query takes 3 secs to complete on a dedicated server, i guess after 6 months it will be at more than 60 secs or so as the tables grow.

edit: here is the explain of the query ( although this is the first time i actually use it & not so sure what it means)

id  select_type table   type    possible_keys   key key_len ref rows    Extra1   SIMPLE  t2  ALL PRIMARY NULL    NULL    NULL    671549  Using temporary; Using filesort1   SIMPLE  t1  ref PRIMARY PRIMARY 8   fbads.t2.adgroup_id 358 Using index

That looks like a full table scan, & with that rapid growth small performance changes won’t make a huge difference on the long run. You need a different approach.

I would calculate aggregates for the previous months (days, etc) with a cron job, & when you need stats then merge that with the fresh results (using the query you already wrote). That why you only have to scan the fresh record, which means the queries is going to be fast.

Alternatively, you can keep up-to-date counters in the adgroups table, & update them on each click. Not sure if mysql is the right tool for this, I can recommend MongoDB, it can do very swift atomic increments on fields, & though it doesn’t donate you as strict guarantees (ACID) as a relational database, in this case it’s not a problem, ad clicks aren’t mission critical data, nobody is going to complain, if you lose < 0.01% percent of click information.

No comments:

Post a Comment