Only summarise some levels of a group [dplyr]
我正在尝试(使用 dplyr)计算如何仅总结一个分组变量的一个级别,以保持所有其余部分相同。例如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | library(dplyr) dat <- starwars %>% select(height, hair_color) %>% filter(!is.na(hair_color)) dat %>% group_by(hair_color) %>% summarise(mean_height = mean(height)) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 12 x 2 #> hair_color mean_height #> <chr> <dbl> #> 1 auburn 150 #> 2 auburn, grey 180 #> 3 auburn, white 182 #> 4 black NA #> 5 blond 177. #> 6 blonde 168 #> 7 brown NA #> 8 brown, grey 178 #> 9 grey 170 #> 10 none NA #> 11 unknown NA #> 12 white 156 |
将总结
我看到一个带有
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | dat_split <- dat %>% mutate(is_blond = ifelse(hair_color %in% c("blond"),"blond","not_blond")) %>% split(.$is_blond) d1 <- dat_split[["blond"]] %>% group_by(hair_color) %>% summarise(height = mean(height)) #> `summarise()` ungrouping output (override with `.groups` argument) d2 <- dat_split[["not_blond"]] %>% select(-is_blond) dat_final <- bind_rows(d1, d2) dat_final #> # A tibble: 80 x 2 #> hair_color height #> <chr> <dbl> #> 1 blond 177. #> 2 none 202 #> 3 brown 150 #> 4 brown, grey 178 #> 5 brown 165 #> 6 black 183 #> 7 auburn, white 182 #> 8 auburn, grey 180 #> 9 brown 228 #> 10 brown 180 #> # ... with 70 more rows |
然而,这似乎有点冗长(而且笨拙)。我想知道这是否是
我们可以通过
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library(dplyr) dat %>% mutate(height = replace(height, hair_color == 'blond', mean(height[hair_color == 'blond']))) # A tibble: 82 x 2 # height hair_color # <dbl> <chr> # 1 177. blond # 2 202 none # 3 150 brown # 4 178 brown, grey # 5 165 brown # 6 183 black # 7 182 auburn, white # 8 177. blond # 9 180 auburn, grey #10 228 brown # a€| with 72 more rows |
在
1 2 | library(data.table) setDT(dat)[hair_color == 'blond', height := mean(height)] |
你可以试试
1 2 3 4 | dat %>% mutate(valid = hair_color =="blond") %>% group_by(valid) %>% mutate(mean_h = ifelse(valid, mean(height), height), .keep="unused") |
给
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # A tibble: 82 x 2 hair_color mean_h <chr> <dbl> 1 blond 177. 2 none 202 3 brown 150 4 brown, grey 178 5 brown 165 6 black 183 7 auburn, white 182 8 blond 177. 9 auburn, grey 180 10 brown 228 # ... with 72 more rows |