RStudio not picking the encoding I'm telling it to use when reading a file
我试图用r读取以下utf-8编码的文件,但每当我读取它时,Unicode字符都没有正确编码:
我用来处理文件的脚本如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | defaultEncoding <-"UTF8" detalheVotacaoMunicipioZonaTypes <- c("character","character","factor","factor","factor","factor","factor", "factor","factor","factor","factor","factor","numeric", "numeric","numeric","numeric","numeric","numeric", "numeric","numeric","numeric","numeric","numeric", "numeric","character","character") readDetalheVotacaoMunicipioZona <- function( fileName ) { fileConnection = file(fileName,encoding=defaultEncoding) contents <- readChar(fileConnection, file.info(fileName)$size) close(fileConnection) contents <- gsub('"',"", contents) columnNames <- c("data_geracao","hora_geracao","ano_eleicao","num_turno","descricao_eleicao","sigla_uf","sigla_ue", "codigo_municipio","nome_municipio","numero_zona","codigo_cargo","descricao_cargo","qtd_aptos", "qtd_secoes","qtd_secoes_agregadas","qtd_aptos_tot","qtd_secoes_tot","qtd_comparecimento", "qtd_abstencoes","qtd_votos_nominais","qtd_votos_brancos","qtd_votos_nulos","qtd_votos_legenda", "qtd_votos_anulados","data_ult_totalizacao","hora_ult_totalizacao") read.csv(text=contents, colClasses=detalheVotacaoMunicipioZonaTypes, sep=";", col.names=columnNames, fileEncoding=defaultEncoding, header=FALSE) } |
我读取以UTF-8编码发送的文件,删除所有引号(偶数被引用,所以我需要清理它们),然后将内容发送到
我该怎么做才能让它使用UTF-8来读取这个文件?
我在OSX上使用rstudio,如果有什么区别的话。
此问题是由于设置了错误的区域设置造成的,无论是在rstudio或命令行r中:
如果问题只发生在rstudio而不是命令行r中,请转到rstudio->preferences:general,告诉我们"默认文本编码:"设置为什么,单击"更改"并尝试Windows-1252、UTF-8或ISO8859-1("Latin1")(否则,如果您总是希望得到提示,请单击"询问")。屏幕截图附在底部。让我们知道哪一个有效!
如果命令行R中也出现了问题,请执行以下操作:
你的Mac电脑上是否有
对于这两个区域设置,请尝试更改为该区域设置:
1 2 3 4 5 6 7 8 9 10 11 | # first try Windows CP1252, although that's almost surely not supported on Mac: Sys.setlocale("LC_ALL","pt_PT.1252") # Make sure not to omit the `"LC_ALL",` first argument, it will fail. Sys.setlocale("LC_ALL","pt_PT.CP1252") # the name might need to be 'CP1252' # next try IS08859-1(/'latin1'), this works for me: Sys.setlocale("LC_ALL","pt_PT.ISO8859-1") # Try"pt_PT.UTF-8" too... # in your program, make sure the Sys.setlocale worked, sprinkle this assertion in your code before attempting to read.csv: stopifnot(Sys.getlocale('LC_CTYPE') =="pt_PT.ISO8859-1") |
那应该管用。严格地说,
让我们知道哪种方法有效!我正试图更全面地记录这一点,以便我们能够找出正确的增强。
对我来说很好。
是否尝试更改/重置区域设置?
在我的情况下,它与
1 2 3 4 5 6 7 8 9 10 11 | Sys.setlocale(category ="LC_ALL", locale ="Portuguese_Portugal.1252") d <- read.table(text=readClipboard(), header=TRUE, sep = ';') head(d) 1 25/04/2014 22:29:30 2012 1 ELEI??O MUNICIPAL 2012 PB 20419 20419 ITAPORANGA 33 13 VEREADOR 17157 2 25/04/2014 22:29:30 2012 1 ELEI??O MUNICIPAL 2012 PB 20770 20770 MALTA 51 11 PREFEITO 4677 3 25/04/2014 22:29:30 2012 1 ELEI??O MUNICIPAL 2012 PB 21091 21091 OLHO D'áGUA 32 13 VEREADOR 6653 4 25/04/2014 22:29:30 2012 1 ELEI??O MUNICIPAL 2012 PB 21113 21113 OLIVEDOS 23 13 VEREADOR 3243 ... |
我在R中的葡萄牙语区域设置也有同样的问题(Mac OS 10.12.3)我已经按照上面的思路试过了,没有人工作。然后我找到了这个网页:https://docs.moodle.org/dev/table_of_locales刚刚尝试过
您应该尝试