Most frequent occurance size per user
不幸的是,我还有一个问题我无法单独解决 - 我想列出每个用户订购最多的尺寸。当 2 个或更多大小具有相同的出现时,它应该写一个 "-"
已经用数据表试过了,但我一直在纠结如何解决它;)
1 | setDT(DB)[, `:=` (mostorderedsize = .N), by='customerID,size'] |
预期结果:
1 | mostorderedsize = c("m","-", 42,"m","m", 42,"-","-","m","m") |
数据:
1 2 3 4 5 6 | DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10), orderDate = c("1.1.14","1.1.14","1.1.14","1.1.14","2.1.14","2.1.14","2.1.14","2.1.14","2.1.14","2.1.14"), itemID = c(2,3,2,5,12,4,2,3,1,5), size = c("m","l", 42,"xxl","m", 42, 39,"m","xl", 44), customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1), ItemReturned = c(0, 0, 0, 1, 1, 0, 1, 0, 0, 0)) |
希望你能告诉我什么是错的,或者告诉我解决问题的另一种可能性。
使用基础 R:
使用
汇总大小
1 2 3 4 5 | tmp <- with(DB, tapply(size, customerID, function(x) { tbl <- table(x) most <- which(tbl == max(tbl)) if (length(most) > 1) return('-') else return(names(tbl)[most]) })) |
然后可以用customerID列索引来广播结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | DB$mostoreredsize <- tmp[DB$customerID] DB > DB orderID orderDate itemID size customerID ItemReturned mostoreredsize 1 1 1.1.14 2 m 1 0 m 2 2 1.1.14 3 l 2 0 - 3 3 1.1.14 2 42 3 0 42 4 4 1.1.14 5 xxl 1 1 m 5 5 2.1.14 12 m 1 1 m 6 6 2.1.14 4 42 3 0 42 7 7 2.1.14 2 39 2 1 - 8 8 2.1.14 3 m 2 0 - 9 9 2.1.14 1 xl 1 0 m 10 10 2.1.14 5 44 1 0 m |