Eliminate full table scan due to BETWEEN (and GROUP BY)
说明
根据
1 | Y.YEAR BETWEEN 1900 AND 2009 AND |
代码
这是具有范围条件的代码(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | SELECT COUNT(1) AS MEASUREMENTS, AVG(D.AMOUNT) AS AMOUNT, Y.YEAR AS YEAR, MAKEDATE(Y.YEAR,1) AS AMOUNT_DATE FROM CITY C, STATION S, STATION_DISTRICT SD, YEAR_REF Y FORCE INDEX(YEAR_IDX), MONTH_REF M, DAILY D WHERE -- For a specific city ... -- C.ID = 10663 AND -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) + (COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) * POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND -- Get the station district identification for the matching station. -- S.STATION_DISTRICT_ID = SD.ID AND -- Gather all known years for that station ... -- Y.STATION_DISTRICT_ID = SD.ID AND -- The data before 1900 is shaky; insufficient after 2009. -- Y.YEAR BETWEEN 1900 AND 2009 AND -- Filtered by all known months ... -- M.YEAR_REF_ID = Y.ID AND -- Whittled down by category ... -- M.CATEGORY_ID = '003' AND -- Into the valid daily climate data. -- M.ID = D.MONTH_REF_ID AND D.DAILY_FLAG_ID <> 'M' GROUP BY Y.YEAR |
更新
SQL正在执行全表扫描,这导致MySQL执行"复制到tmp表",如下所示:
1 2 3 4 5 6 7 8 9 10 | +----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+ | 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | USING WHERE | | 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | USING INDEX | | 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | USING WHERE | | 1 | SIMPLE | M | REF | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | USING WHERE | | 1 | SIMPLE | D | REF | INDEX | INDEX | 8 | climate.M.ID | 11 | USING WHERE | +----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+ |
答案
使用
1 2 3 4 5 6 7 8 9 10 | +----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+ | 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | USING TEMPORARY; USING filesort | | 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | USING WHERE | | 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | USING INDEX | | 1 | SIMPLE | Y | REF | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | USING WHERE | | 1 | SIMPLE | M | REF | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | USING WHERE | | 1 | SIMPLE | D | REF | INDEX | INDEX | 8 | climate.M.ID | 11 | USING WHERE | +----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+ |
相关
- http://dev.mysql.com/doc/refman/5.0/zh-CN/how-to-avoid-table-scan.html
- http://dev.mysql.com/doc/refman/5.0/en/where-optimizations.html
- 优化使用between子句的SQL
谢谢!
一个请求...看起来像您知道数据。添加关键字" STRAIGHT_JOIN "并查看结果...
SELECT STRAIGHT_JOIN ...其余的查询...
Straight-join告诉MySql按照我列出的方法去做。因此,您的CITY表是FROM列表中的第一个表,因此表明您希望它是您的主表。此外,您的CITY的WHERE子句是立即过滤器。话虽如此,它可能会遍历查询的其余部分...
希望它能对您有所帮助...它对我有用,它可以处理数百万条记录的政府数据,并加入到10个查询表中,而mySql正是在这些查询表中为我思考。
您可以从在半径范围内搜索更改为在边界框中搜索吗?
您了解城市,因此可以在应用程序中计算边界框。
也许这
1 2 3 4 | S.LATITUDE_DECIMAL >= latitude_lower AND S.LATITUDE_DECIMAL <= latitude_upper AND S.LONGITUDE_DECIMAL >= longitude_lower AND S.LONGITUDE_DECIMAL <= longitude_upper |
可以快一点吗?
为了在查询之间做高效,您需要在YEAR列上使用b树索引。例如:
1 | CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR); |
BTREE索引允许进行有效的范围查询,如果实际上这是根本问题,那么拥有这样的索引应该摆脱全表扫描,而只扫描范围内表的一部分。在Wikipedia上了解有关btree的更多信息
但是,与任何优化建议一样,您应该采取措施确保自己的伤害不会大于弊端。