R write.csv with UTF-16 encoding
我在使用
背景:我试图从data.frame中写出CSV文件以在Excel中使用。 Excel Mac 2011似乎不喜欢UTF-8(如果我在文本导入期间指定UTF-8,则非ASCII字符将显示为下划线)。 我一直认为Excel将对UTF-16LE编码感到满意。
这是示例data.frame:
1 2 3 4 5 6 7 | > foo a b 1 á 羽 > Encoding(levels(foo$a)) [1]"UTF-8" > Encoding(levels(foo$b)) [1]"UTF-8" |
所以我尝试通过执行以下操作来输出data.frame:
1 2 | f <- file("foo.csv", encoding="UTF-16LE") write.csv(foo, f) |
这给了我一个ASCII文件,看起来像:
1 | ""," |
如果使用
如果使用
这是在Mac OS X 10.6.6上的R 2.12.2的64位版本上。 我究竟做错了什么?
您可以简单地将csv保存在UTF-8中,然后在终端中使用iconv将其转换为UTF-16LE。
如果您坚持要在R中执行此操作,则以下方法可能会起作用-尽管似乎R中的
1 2 3 4 | > x <- c("foo","bar") > iconv(x,"UTF-8","UTF-16LE") Error in iconv(x,"UTF-8","UTF-16LE") : embedded nul in string: 'f\0o\0o\0' |
如您所见,确实需要上面链接的补丁-我没有测试过,但是如果您想保持它简单(又讨厌):保存表后,只需用
可能会发生类似的事情(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | #' function to convert character vectors to UTF-8 encoding #' #' @param x the vector to be converted #' @export toUTF8 <- function(x){ worker <- function(x){ iconv(x, from = Encoding(x), to ="UTF-8") } unlist(lapply(x, worker)) } #' function to write csv files with UTF-8 characters (even under Windwos) #' @param df data frame to be written to file #' @param file file name / path where to put the data #' @export write_utf8_csv <- function(df, file){ firstline <- paste( '"', names(df), '"', sep ="", collapse =" ,") char_columns <- seq_along(df[1,])[sapply(df, class)=="character"] for( i in char_columns){ df[,i] <- toUTF8(df[,i]) } data <- apply(df, 1, function(x){paste('"', x,'"', sep ="",collapse =" ,")}) writeLines( c(firstline, data), file , useBytes = T) } #' function to read csv file with UTF-8 characters (even under Windwos) that #' were created by write_U #' @param df data frame to be written to file #' @param file file name / path where to put the data #' @export read_utf8_csv <- function(file){ # reading data from file content <- readLines(file, encoding ="UTF-8") # extracting data content <- stringr::str_split(content," ,") content <- lapply(content, stringr::str_replace_all, '"',"") content_names <- content[[1]][content[[1]]!=""] content <- content[seq_along(content)[-1]] # putting it into data.frame df <- data.frame(dummy=seq_along(content), stringsAsFactors = F) for(name in content_names){ tmp <- sapply(content, `[[`, dim(df)[2]) Encoding(tmp) <-"UTF-8" df[,name] <- tmp } df <- df[,-1] # return return(df) } |