SQL优化实例,看我如何把一个耗时1分钟的请求优化到9秒。
这主要是一个数据统计的web页面,展现统计结果。由于当时需求很紧,要求尽快能出来查询结果就行,而且是说以后会重新对这部分功能进行规划再开发。所以也没有考虑任何性能相关,劈哩啪啦的一天把很多数据统计的功能实现了。
现在,数据量已然有点规模了,但是还没有再开发的计划,目前还使用这个统计程序,但是由于有些数据量了,并且没有考虑过优化,所以目前查询速度很慢。尤其是其中有一个查询,居然耗时1分多,问题很大啊,今天抽点时间,找找问题,优化一下,主要做的工作的是改进sql语句,添加合适的索引。
存储的数据是根据用户的ID末尾的值来简单分表的,所以共有0到9一共10个表,需要遍历这10个表,然后把最后的数据相加,再处理后得到最后结果。由于需求还算比较简单,所以没有采用定时脚本去处理前几日的数据后入库保存的做法,如果采用以上做法,速度会快更多。
以下是其中的一个处理的一条sql,运行看起来耗时11.76秒。
mysql> select sql_no_cache count(distinct(a.mpnumber)) as mpnumber,date_format(a.bindtime,'%Y-%m-%d') as gDate from new_bind_mp_weibo_user_info_7 a left join weibo_user_visit_info_7 b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL and a.bindtime between '2012-03-01' and '2012-03-27 23:59:59' group by gDate order by gDate;
+----------+------------+
| mpnumber | gDate |
+----------+------------+
| 22 | 2012-03-01 |
| 17 | 2012-03-02 |
| 16 | 2012-03-03 |
| 14 | 2012-03-04 |
| 12 | 2012-03-05 |
| 5 | 2012-03-06 |
| 8 | 2012-03-07 |
| 26 | 2012-03-08 |
| 10 | 2012-03-09 |
| 3 | 2012-03-10 |
| 2 | 2012-03-11 |
| 42 | 2012-03-12 |
| 48 | 2012-03-13 |
| 40 | 2012-03-14 |
| 61 | 2012-03-15 |
| 59 | 2012-03-16 |
| 70 | 2012-03-17 |
| 87 | 2012-03-18 |
| 44 | 2012-03-19 |
| 47 | 2012-03-20 |
| 66 | 2012-03-21 |
| 48 | 2012-03-22 |
| 12 | 2012-03-26 |
| 39 | 2012-03-27 |
+----------+------------+
24 rows in set (11.76 sec)
mysql> explain select count(distinct(a.mpnumber)) as mpnumber,date_format(a.bindtime,'%Y-%m-%d') as gDate from new_bind_mp_weibo_user_info_7 a left join weibo_user_visit_info_7 b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL and a.bindtime between '2012-03-01' and '2012-03-27 23:59:59' group by gDate order by gDate G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: a
type: ALL
possible_keys: search_key
key: NULL
key_len: NULL
ref: NULL
rows: 3855
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: b
type: ref
possible_keys: search_index4
key: search_index4
key_len: 45
ref: newwifi_visit_info.a.apmac
rows: 223
Extra: Using where; Not exists
2 rows in set (0.00 sec)
首先做的第一步,把连接的两个表做处理,分别只提取出需要的字段,并把对a表的时间限制放到连接前。
mysql> select sql_no_cache count(distinct(a.mpnumber)) as mpnumber, date(a.bindtime) as gDate from (select mpnumber,apmac,bindtime from new_bind_mp_weibo_user_info_7 where bindtime between '2012-03-01' and '2012-03-28' ) as a left join (select id,mpnumber,apmac from weibo_user_visit_info_7) as b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL group by gDate order by gDate;
+----------+------------+
| mpnumber | gDate |
+----------+------------+
| 22 | 2012-03-01 |
| 17 | 2012-03-02 |
| 16 | 2012-03-03 |
| 14 | 2012-03-04 |
| 12 | 2012-03-05 |
| 5 | 2012-03-06 |
| 8 | 2012-03-07 |
| 26 | 2012-03-08 |
| 10 | 2012-03-09 |
| 3 | 2012-03-10 |
| 2 | 2012-03-11 |
| 42 | 2012-03-12 |
| 48 | 2012-03-13 |
| 40 | 2012-03-14 |
| 61 | 2012-03-15 |
| 59 | 2012-03-16 |
| 70 | 2012-03-17 |
| 87 | 2012-03-18 |
| 44 | 2012-03-19 |
| 47 | 2012-03-20 |
| 66 | 2012-03-21 |
| 48 | 2012-03-22 |
| 12 | 2012-03-26 |
| 39 | 2012-03-27 |
+----------+------------+
24 rows in set (5.96 sec)
mysql> explain select sql_no_cache count(distinct(a.mpnumber)) as mpnumber, date(a.bindtime) as gDate from (select mpnumber,apmac,bindtime from new_bind_mp_weibo_user_info_7 where bindtime between '2012-03-01' and '2012-03-28' ) as a left join (select id,mpnumber,apmac from weibo_user_visit_info_7) as b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL group by gDate order by gDateG
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1196
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: <derived3>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 28099
Extra: Using where; Not exists
*************************** 3. row ***************************
id: 3
select_type: DERIVED
table: weibo_user_visit_info_7
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 28099
Extra:
*************************** 4. row ***************************
id: 2
select_type: DERIVED
table: new_bind_mp_weibo_user_info_7
type: ALL
possible_keys: search_key
key: NULL
key_len: NULL
ref: NULL
rows: 3856
Extra: Using where
4 rows in set (0.03 sec)
然后,可以看到,遍历b表的行数还是很多的,所以根据需求,添加合适的对b表的限定条件,缩减b表的行数。然后再运行的时候,就能看到,只需要1秒多了。
mysql> select sql_no_cache count(distinct(a.mpnumber)) as mpnumber, date(a.bindtime) as gDate from (select mpnumber,apmac,bindtime from new_bind_mp_weibo_user_info_7 where bindtime between '2012-03-01' and '2012-03-28' ) as a left join (select id,mpnumber,apmac from weibo_user_visit_info_7 where visittime between '2012-03-01' and '2012-03-28') as b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL group by gDate order by gDate;
+----------+------------+
| mpnumber | gDate |
+----------+------------+
| 22 | 2012-03-01 |
| 17 | 2012-03-02 |
| 16 | 2012-03-03 |
| 14 | 2012-03-04 |
| 12 | 2012-03-05 |
| 5 | 2012-03-06 |
| 8 | 2012-03-07 |
| 26 | 2012-03-08 |
| 10 | 2012-03-09 |
| 3 | 2012-03-10 |
| 2 | 2012-03-11 |
| 42 | 2012-03-12 |
| 48 | 2012-03-13 |
| 40 | 2012-03-14 |
| 61 | 2012-03-15 |
| 59 | 2012-03-16 |
| 70 | 2012-03-17 |
| 87 | 2012-03-18 |
| 44 | 2012-03-19 |
| 47 | 2012-03-20 |
| 66 | 2012-03-21 |
| 48 | 2012-03-22 |
| 12 | 2012-03-26 |
| 39 | 2012-03-27 |
+----------+------------+
24 rows in set (1.82 sec)
mysql> explain select sql_no_cache count(distinct(a.mpnumber)) as mpnumber, date(a.bindtime) as gDate from (select mpnumber,apmac,bindtime from new_bind_mp_weibo_user_info_7 where bindtime between '2012-03-01' and '2012-03-28' ) as a left join (select id,mpnumber,apmac from weibo_user_visit_info_7 where visittime between '2012-03-01' and '2012-03-28') as b on a.mpnumber=b.mpnumber and a.apmac=b.apmac where b.id is NULL group by gDate order by gDateG
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1196
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: <derived3>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9129
Extra: Using where; Not exists
*************************** 3. row ***************************
id: 3
select_type: DERIVED
table: weibo_user_visit_info_7
type: ALL
possible_keys: search_index3
key: NULL
key_len: NULL
ref: NULL
rows: 28101
Extra: Using where
*************************** 4. row ***************************
id: 2
select_type: DERIVED
table: new_bind_mp_weibo_user_info_7
type: ALL
possible_keys: search_key
key: NULL
key_len: NULL
ref: NULL
rows: 3856
Extra: Using where
4 rows in set (0.02 sec)
没有写出来如何添加索引的,也就不写了,自己根据explain的结果看,需要哪些索引吧。
有一点要说的是,再对某datetime类型的列添加索引后,发现查询时没有使用索引,explain了一下,会遍历27K多条数据,使用force index再来explain下这条sql,类似: explain SELECT * from table force index(index_name) where datetime between ‘2011-03-01’ and ‘2012-03-28’ ;得到的结果是会遍历9K条数据,占上个结果的三分之一,所用时间同不用索引没啥差别,所以,可能是当数据量小到一定的比例时,索引才会被调用。
—–
2012年3月28日 PS:查询了下手册,发现了关于是否使用索引的说明
每个表的索引被查询,并且使用最好的索引,除非优化器认为使用表扫描更有效。是否使用扫描取决于是否最好的索引跨越超过30%的表。优化器更加复杂,其估计基于其它因素,例如表大小、行数和I/O块大小,因此固定比例不再决定选择使用索引还是扫描。
官方手册:http://dev.mysql.com/doc/refman/5.1/zh/optimization.html#where-optimizations











