issue in writing data from 2 RDDs (one with unicode data and one with normal )into a csv file in PySpark?
有两
在
1 | [[u'a',u'b',u'c'],[u'c',u'f',u'a'],[u'ab',u'cd',u'gh']...] |
rdd2:
1 | [(10.1, 10.0), (23.0, 34.0), (45.0, 23.0),....] |
《
1 2 3 | a,b,c,10.0 c,f,a,34.0 ab,cd,gh,23.0 |
我怎么做的
更新:这是我流的代码:
1 2 3 4 5 6 7 8 9 | columns_num = [0,1,2,4,7] rdd1 = rdd3.map(lambda row: [row[i] for i in columns_num]) rdd2 = rd.map(lambda tup: (tup[0], tup[1]+ (tup[0]/3)) if tup[0] - tup[1] >= tup[0]/3 else (tup[0],tup[1])) with open("output.csv","w") as fw: writer = csv.writer(fw) for (r1, r2) in izip(rdd1.toLocalIterator(), rdd2.toLocalIterator()): writer.writerow(r1 + tuple(r2[1:2])) |
我要为
如果在本地,您的意思是驱动程序文件系统,那么您可以简单地
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import csv import sys if sys.version_info.major == 2: from itertools import izip else: izip = zip rdd1 = sc.parallelize([(10.1, 10.0), (23.0, 34.0), (45.0, 23.0)]) rdd2 = sc.parallelize([("a","b" ," c"), ("c","f","a"), ("ab","cd","gh")]) with open("output.csv","w") as fw: writer = csv.writer(fw) for (r1, r2) in izip(rdd2.toLocalIterator(), rdd1.toLocalIterator()): writer.writerow(r1 + r2[1:2]) |