关于r:创建新的一组变量等于dplyr中一个因子的水平

Create new set of variables equal to the level of a factor in dplyr

我有一个包含 100 列的 data.frame,它们遵循约定 wordword_answer

1
df <- data.frame(apple ="57%", apple_answer ="22%", dog ="82%", dog_answer ="16%")

我这样设置上述两个因子变量的水平:

1
2
levels(df$apple) <- c("66%","57%","48%","39%","30%","22%","12%" )
levels(df$dog) <- c("82%","71%","60%","49%","38%","27%","16%" )

我正在尝试计算一个距离分数,它是一个 word 因子的数值水平与其对应 word_answer.

的数值水平之间的距离

因此,例如,在"apple"答案的情况下,apple 的第一行是"57%",这是该因子中的第二个因子水平

1
2
> which(levels(df$apple) =="57%")
[1] 2

对应的apple_answer列的因子水平为6

1
2
> which(levels(df$apple) =="22%")
[1] 6

所以这种情况下的距离分数是 2-6 = -4

如何计算数据集中每个变量的这些距离分数?


你也可以使用apply函数,像这样:

1
2
3
4
5
6
7
8
9
10
11
df$apple_dist = apply(df[,1:2], 1, function(x) {
    which(levels(df$apple) == x[1]) - which(levels(df$apple) == x[2])
})

df$dog_dist = apply(df[,3:4], 1, function(x) {
    which(levels(df$dog) == x[1]) - which(levels(df$dog) == x[2])
})

> df
  apple apple_answer dog dog_answer apple_dist dog_dist
1   57%          22% 82%        16%         -4       -6

您可以将数据分为两组,单词及其对应的答案。使用 match 获取它们的位置并从每个值中减去并生成新列。

1
2
3
4
5
6
7
8
9
answer_cols <- grep('_answer', names(df))
new_cols <- paste0(names(df)[-answer_cols], '_dist')

df[new_cols] <- Map(function(x, y) match(x, levels(x)) - match(y, levels(x)),
                                     df[-answer_cols], df[answer_cols])

df
#  apple apple_answer dog dog_answer apple_dist dog_dist
#1   57%          22% 82%        16%         -4       -6