R语言实战——ROC曲线的绘制

前言：以前使用Matlab绘制ROC曲线常常是工具箱有就画，没有就不画，而且在想画的时候工具箱恰恰就没有，很纳闷。然后无意间发现了一篇用R语言绘制ROC曲线的文章，赶紧学了并分享出来，以备不时之需。

先通过一个例子来讲解一下参数的作用，使用的数据是大名鼎鼎的Iris数据集，R语言自带。

数据处理
第一步当然得处理一下数据。默认的Iris数据集有三类鸢尾花，我目前的理解是只有二分类才画的出ROC曲线，所以才去一定的手段处理一下数据：

输入

1
2
3
4
5
6
7

# 数据准备
iris2 <- iris
iris2$label[iris2$Species == 'setosa'] <- 1
iris2$label[iris2$Species == 'versicolor'] <- 2
iris2 <- iris2[-which(iris2$Species == 'virginica'), ] # 剔除类型为virginica的数据
iris2$Species <- NULL # 去除Species列
head(iris2,10) # 显示前10个数据

输出

1
2
3
4
5
6
7
8
9
10
11
12

> head(iris2,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width label
1 5.1 3.5 1.4 0.2 1
2 4.9 3.0 1.4 0.2 1
3 4.7 3.2 1.3 0.2 1
4 4.6 3.1 1.5 0.2 1
5 5.0 3.6 1.4 0.2 1
6 5.4 3.9 1.7 0.4 1
7 4.6 3.4 1.4 0.3 1
8 5.0 3.4 1.5 0.2 1
9 4.4 2.9 1.4 0.2 1
10 4.9 3.1 1.5 0.1 1

参数设置

以label与Sepal.Length之间的ROC曲线为例：

1	auc1 <- roc(label~Sepal.Length, data=iris2, smooth=FALSE)

接着通过实际图形看看使用plot绘制ROC曲线的一些参数：

原始图形

1	plot(auc1)

在这里插入图片描述

print.auc：在图中显示AUC的值，AUC的大小等于ROC曲线下方的面积大小；

1	plot(auc1, print.auc=TRUE)

在这里插入图片描述

print.thres：在图中显示ROC曲线的阈值(threshold)，大概为ROC曲线中最尖的那个点；

1	plot(auc1, print.thres=TRUE)

在这里插入图片描述

print.thres.col：设置阈值数据的颜色。

1	plot(auc1, print.thres=TRUE, print.thres.col="blue")

在这里插入图片描述

col：设置ROC曲线的颜色。

1	plot(auc1,col="blue")

在这里插入图片描述

identity.col：设置对角线的颜色。

1	plot(auc1, identity.col="blue")

在这里插入图片描述

identity.lty：设置对角线的类型，取数字。

1	plot(auc1,identity.lty=2)

在这里插入图片描述

identity.lwd：设置对角线的线宽，默认宽度为1。

1	plot(auc1, identity.lwd=2)

在这里插入图片描述

综合以上参数，得到如下图形

1
2
3
4

auc1 <- roc(label~Sepal.Length, data=iris2, smooth=FALSE)
plot(auc1, print.auc=TRUE, print.thres=TRUE, main="多组ROC曲线比较",
col="blue", print.thres.col="blue", identity.col="blue",
identity.lty=2, identity.lwd=1)

在这里插入图片描述

多变量ROC曲线比较

1
2
3
4
5
6
7
8
9
10
11

# ROC曲线的绘制
auc1 <- roc(label~Sepal.Length, data=iris2, smooth=FALSE)
plot(auc1, print.auc=TRUE, print.thres=TRUE, main="多组ROC曲线比较",
col="blue", print.thres.col="blue", identity.col="blue",
identity.lty=2, identity.lwd=1)
auc2 <- roc(label~Sepal.Width, data=iris2,smooth=FALSE)
auc3 <- roc(label~Petal.Length, data=iris2,smooth=FALSE)
auc4 <- roc(label~Petal.Width, data=iris2, smooth=FALSE)
lines(auc2, col="red")
lines(auc3,col="green")
lines(auc4,col="yellow")

在这里插入图片描述

ROC检验——DeLong’s Test

输入

1	roc.test(auc1,auc2)

输出

1
2
3
4
5
6
7
8
9

Bootstrap test for two correlated ROC curves

data: auc1 and auc2
D = 0.19387, boot.n = 2000, boot.stratified = 1, p-value =
0.8463
alternative hypothesis: true difference in AUC is not equal to 0
sample estimates:
AUC of roc1 AUC of roc2
0.9326 0.9248

四组间整体检验

输入

1	compareROCdep(iris2[,-5], iris2$label, method="auc") ##四组间整体检验

输出

1
2
3
4
5
6
7
8
9

> compareROCdep(iris2[,-5], iris2$label, method="auc") ##四组间整体检验
In the considered database there are 50 controls and 50 cases.

Method considered: AUC comparison (DeLong, DeLong and Clarke-Pearson, 1988)

Progress bar: Estimation of statistic value in each variable (k = 4)
|=======================================================================================================================================| 100%
Error in solve.default(M) :
Lapack例行程序dgesv: 系统正好是奇异的: U[3,3] = 0

在这里插入图片描述