Create new set of variables equal to the level of a factor in dplyr
我有一个包含 100 列的 data.frame,它们遵循约定
1 | df <- data.frame(apple ="57%", apple_answer ="22%", dog ="82%", dog_answer ="16%") |
我这样设置上述两个因子变量的水平:
1 2 | levels(df$apple) <- c("66%","57%","48%","39%","30%","22%","12%" ) levels(df$dog) <- c("82%","71%","60%","49%","38%","27%","16%" ) |
我正在尝试计算一个距离分数,它是一个
的数值水平之间的距离
因此,例如,在"apple"答案的情况下,
1 2 | > which(levels(df$apple) =="57%") [1] 2 |
对应的
1 2 | > which(levels(df$apple) =="22%") [1] 6 |
所以这种情况下的距离分数是 2-6 = -4
如何计算数据集中每个变量的这些距离分数?
你也可以使用apply函数,像这样:
1 2 3 4 5 6 7 8 9 10 11 | df$apple_dist = apply(df[,1:2], 1, function(x) { which(levels(df$apple) == x[1]) - which(levels(df$apple) == x[2]) }) df$dog_dist = apply(df[,3:4], 1, function(x) { which(levels(df$dog) == x[1]) - which(levels(df$dog) == x[2]) }) > df apple apple_answer dog dog_answer apple_dist dog_dist 1 57% 22% 82% 16% -4 -6 |
您可以将数据分为两组,单词及其对应的答案。使用
1 2 3 4 5 6 7 8 9 | answer_cols <- grep('_answer', names(df)) new_cols <- paste0(names(df)[-answer_cols], '_dist') df[new_cols] <- Map(function(x, y) match(x, levels(x)) - match(y, levels(x)), df[-answer_cols], df[answer_cols]) df # apple apple_answer dog dog_answer apple_dist dog_dist #1 57% 22% 82% 16% -4 -6 |