Select max value on multiple tables, without counting them twice
我正在做一个查询,允许我按分数订购食谱。
表结构
结构是Leaflet包含一个或多个
示例查询
我想获取 recipe_id 和作为配方一部分的每种成分的 MAX 价格权重的总和(由成分 to_recipe 链接),但如果一个配方有多个成分属于同一个 flyers_item,它应该计算一次.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | SELECT itr.recipe_id, SUM(itr.weight), SUM(max_price_weight), SUM(itr.weight + max_price_weight) AS score FROM ( SELECT MAX(itf.max_price_weight) AS max_price_weight, itf.flyer_item_id, itf.ingredient_id FROM (SELECT ifi.ingredient_id, MAX(i.price_weight) AS max_price_weight, ifi.flyer_item_id FROM flyer_items i JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id WHERE i.flyer_id IN (1, 2) GROUP BY ifi.ingredient_id ) itf GROUP BY itf.flyer_item_id) itf2 JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id` WHERE recipe_id = 5730 GROUP BY itr.`recipe_id` ORDER BY score DESC LIMIT 0,10 |
查询几乎可以正常工作,因为大多数结果都很好,但是对于某些行,某些成分被忽略了,并且没有按应有的方式计入分数。
测试用例
1 2 3 4 5 6 7 8 9 | | recipe_id | 'score' with current query | what 'score' should be | explanation | |-----------|----------------------------|------------------------|-----------------------------------------------------------------------------| | 8376 | 51 | 51 | Good result | | 3152 | 1 | 18 | Only 1 ingredient having a score of one is counted, should be 4 ingredients | | 4771 | 41 | 45 | One ingredient worth score 4 is ignored | | 10230 | 40 | 40 | Good result | | 8958 | 39 | 39 | Good result | | 4656 | 28 | 34 | One ingredient worth 6 is ignored | | 11338 | 1 | 10 | 2 ingredients, worth 4 and 5 are ignored | |
我很难找到一种简单的方法来解释它。让我知道是否还有其他帮助。
这里是运行查询、测试示例和测试用例的演示数据库的链接:https://nofile.io/f/F4YSEu8DWmT/meta.zip
非常感谢。
更新(如 Rick James 所问):
这是我能做到的最远距离。结果总是很好,在子查询中也是如此,但是,我已经完全通过 \\'flyer_item_id\\' 取出了组。所以通过这个查询,我得到了很好的分数,但是如果食谱的许多成分是相同的 flyer_item_item,它们将被计算多次(对于 recipe_id = 10557 的得分将是 59 而不是好的 56,因为 2 个成分价值 3位于同一个 flyers_item 中)。我唯一需要做的就是为每个食谱的每个 flyer_item_id 计算一个 MAX(price_weight),(我最初尝试通过在第一个 group_by 成分 ID 上按 \\'flyer_item_id\\' 进行分组。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | SELECT itr.recipe_id, SUM(itr.weight) as total_ingredient_weight, SUM(itf.price_weight) as total_price_weight, SUM(itr.weight+itf.price_weight) as score FROM (SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id FROM flyer_items fi1 INNER JOIN ( SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id FROM flyer_items JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id GROUP BY id ) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id` GROUP BY `itr`.`recipe_id` ORDER BY `score` DESC LIMIT 10 |
这是解释,但我不确定它是否有用,因为最后一个工作部分仍然缺失:
1 2 3 4 5 6 7 8 9 10 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | | |----|-------------|--------------------------|------------|--------|-------------------------------|---------------|---------|-------------------------------------------------------|--------|----------|---------------------------------|---| | 1 | PRIMARY | itr | NULL | ALL | recipe_id,ingredient_id | NULL | NULL | NULL | 151800 | 100.00 | Using temporary; Using filesort | | | 1 | PRIMARY | <derived2> | NULL | ref | | | 4 | metadata3.itr.ingredient_id | 10 | 100.00 | NULL | | | 2 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | | | 2 | DERIVED | fi1 | NULL | eq_ref | id_2,id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | | | 2 | DERIVED | <derived3> | NULL | ref | | | 9 | metadata3.ingredient_to_flyer_item.flyer_item_id,m... | 10 | 100.00 | NULL | | | 2 | DERIVED | ingredient_to_recipe | NULL | ref | ingredient_id | ingredient_id | 4 | metadata3.ingredient_to_flyer_item.ingredient_id | 40 | 100.00 | NULL | | | 3 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | | | 3 | DERIVED | flyer_items | NULL | eq_ref | id_2,id,flyer_id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | | |
更新 2
我设法找到了一个有效的查询,但现在我必须让它更快,它需要超过 500 毫秒才能运行。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM ( SELECT DISTINCT itf.flyer_item_id as flyer_item_id, itf.recipe_id, itf.weight, aprice_weight AS price_weight FROM (SELECT itfin.flyer_item_id AS flyer_item_id, itfin.price_weight AS aprice_weight, itfin.ingredient_id, itr.recipe_id, itr.weight FROM (SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight FROM ingredient_to_flyer_item ifi1 INNER JOIN ( SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id FROM ingredient_to_flyer_item GROUP BY ingredient_id ) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id WHERE flyer_id IN (1,2) GROUP BY ifi1.ingredient_id) AS itfin INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id` ) AS itf ) ff GROUP BY recipe_id ORDER BY `score` DESC LIMIT 20 |
这里是解释:
1 2 3 4 5 6 7 8 | | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | | |----|-------------|--------------------------|------------|-------|----------------------------------------------|---------------|---------|---------------------|------|----------|---------------------------------|---| | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1318 | 100.00 | Using temporary; Using filesort | | | 2 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 37 | 100.00 | Using temporary | | | 2 | DERIVED | itr | NULL | ref | ingredient_id | ingredient_id | 4 | itfin.ingredient_id | 35 | 100.00 | NULL | | | 4 | DERIVED | <derived5> | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | | | 4 | DERIVED | ifi1 | NULL | ref | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4 | ifi2.ingredient_id | 1 | 12.50 | Using where | | | 5 | DERIVED | ingredient_to_flyer_item | NULL | index | ingredient_id,itx_full | ingredient_id | 4 | NULL | 249 | 100.00 | NULL | | |
我一直想看看这个,但不幸的是直到现在还没有时间。我认为这个查询会给你你正在寻找的结果。
1 2 3 4 5 6 7 8 9 10 11 | SELECT recipe_id, SUM(weight) AS weight, SUM(max_price_weight) AS price_weight, SUM(weight + max_price_weight) AS score FROM (SELECT recipe_id, ingredient_id, MAX(weight) AS weight, MAX(price_weight) AS max_price_weight FROM (SELECT itr.recipe_id, MIN(itr.ingredient_id) AS ingredient_id, MAX(itr.weight) AS weight, fi.id, MAX(fi.price_weight) AS price_weight FROM ingredient_to_recipe itr JOIN ingredient_to_flyer_item itfi ON itfi.ingredient_id = itr.ingredient_id JOIN flyer_items fi ON fi.id = itfi.flyer_item_id GROUP BY itr.recipe_id, fi.id) ri GROUP BY recipe_id, ingredient_id) r GROUP BY recipe_id ORDER BY score DESC LIMIT 10 |
它首先按
的查询
子句给出以下结果,与您上面的"应该是什么分数"列相匹配:
1 2 3 4 5 6 7 8 | recipe_id weight price_weight score 8376 10 41 51 4771 5 40 45 10230 10 30 40 8958 15 24 39 4656 15 19 34 3152 0 18 18 11338 0 10 10 |
我不确定此查询在您的系统上执行的速度有多快,它与您在我的笔记本电脑上的查询相当(我预计会慢一些)。我很确定有一些优化是可能的,但同样,还没有时间彻底研究它们。
我希望这能为您找到可行的解决方案提供更多帮助。
听起来像"爆炸-内爆"。这是查询具有
的地方
有两个常见的修复方法,都涉及将聚合与
分开
案例1:
如果您需要来自 t2 的多个聚合,这种情况会变得很笨拙,因为它一次只允许一个。
案例 2:
1 2 3 4 5 6 |
你有 2 个
1 2 3 4 5 6 7 |
但我不能帮助你,因为你没有通过它所在的表(或别名)来限定
(实际上,
因此,我将把剩下的作为"练习"留给读者"。
索引
1 2 3 4 5 6 | itr: (ingredient_id, recipe_id) -- for the JOIN and WHERE and GROUP BY itr: (recipe_id, ingredient_id, weight) -- for 1st Update (There is no optimization available for the ORDER BY and LIMIT) flyer_items: (flyer_id, price_weight) -- unless flyer_id is the PRIMARY KEY ifi: (flyer_item_id, ingredient_id) ifi: (ingredient_id, flyer_item_id) -- for 1st Update |
请为相关表提供`SHOW CREATE TABLE。
请提供
如果
重新制定
评估完
到
1 2 3 4 |
并用这些计算的和更改初始的
您正在运行什么版本的 MySQL/MariaDB?
我不确定我是否完全理解了这个问题。在我看来,您按错误的列
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | select itr.recipe_id, sum(itr.weight), sum(max_price_weight), sum(itr.weight + max_price_weight) as score from ( select ifi.ingredient_id, max(price_weight) as max_price_weight from flyer_items i join ingredients_to_flyer_item ifi on i.id = ifi.flyer_item_id where flyer_id in (1, 2) group by ifi.ingredient_id ) itf join `ingredient_to_recipe` as itr on itf.`ingredient_id` = itr.`ingredient_id` group by itr.`recipe_id` order by score desc limit 0,10; |
希望对你有帮助。