found: org.apache.spark.sql.Dataset[(Double, Double)] required: org.apache.spark.rdd.RDD[(Double, Double)]
我收到以下错误
1 2 3
| found : org. apache. spark. sql. Dataset[(Double, Double )]
required : org. apache. spark. rdd. RDD[(Double, Double )]
val testMetrics = new BinaryClassificationMetrics (testScoreAndLabel ) |
关于以下代码:
1 2 3 4
| val testScoreAndLabel = testResults.
select("Label", "ModelProbability").
map{ case Row (l :Double,p :Vector ) => (p (1),l ) }
val testMetrics = new BinaryClassificationMetrics (testScoreAndLabel ) |
从错误看来,testScoreAndLabel 是 sql.Dataset 类型,但 BinaryClassificationMetrics 需要 RDD。
如何将 sql.Dataset 转换为 RDD?
我会做这样的事情
1 2 3
| val testScoreAndLabel = testResults.
select("Label", "ModelProbability").
map{ case Row (l :Double,p :Vector ) => (p (1),l ) } |
现在只需执行 testScoreAndLabel.rdd
即可将 testScoreAndLabel 转换为 RDD
1
| val testMetrics = new BinaryClassificationMetrics (testScoreAndLabel. rdd) |
API 文档