How can I aggregate data with sum, mean and count for each column respectively?
本问题已经有最佳答案,请猛点这里访问。
我有一个名为"order_product"的数据集,如下所示:
1 2 3 4 5 6 7 8 9 10 11 | order_id product order_sequence reorder 1 egg 1 1 1 meat 2 0 1 fruit 3 1 1 meat 4 1 2 egg 1 1 2 egg 2 1 2 fruit 3 0 3 egg 1 0 3 fruit 2 1 3 fruit 3 1 |
我将把数据聚合到一个名为"产品"的新数据框中,该数据框按产品分组。新聚合数据集的变量显示了每个产品的总频率、再订购率和平均序列。每个变量的计算如下:
1 2 3 | frequency: product count reorder_rate: sum of reorder/frequency mean_sequence: sum or order_sequence/frequency |
所以结果应该是这样的:
1 2 3 4 | product frequency reorder_rate mean_sequence egg 4 3/4 5/4 meat 2 1/2 3 fruit 4 3/4 11/4 |
有人可以在 R 中帮助我吗?我尝试了 data.table 包中的 melt() 函数,但我不知道如何编写代码。
这样的计算很容易使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(dplyr) df %>% group_by(product) %>% summarise(frequency = n(), reorder_rate = sum(reorder)/frequency, mean_sequence = sum(order_sequence)/frequency) # A tibble: 3 x 4 # product frequency reorder_rate mean_sequence # <fct> <int> <dbl> <dbl> #1 egg 4 0.75 1.25 #2 fruit 4 0.75 2.75 #3 meat 2 0.5 3 |
但是,您也可以使用
1 2 3 4 | library(data.table) setDT(df)[, .(frequency = .N, reorder_rate = sum(reorder)/.N, mean_sequence = sum(order_sequence)/.N), by = product] |