关于编码:Charset从XXX转换为utf-8,命令行

Charset conversion from XXX to utf-8, command line

我有一堆用ISO-8851-2编码的文本文件(有一些波兰字符)。有没有一个Linux/Mac的命令行工具,我可以从shell脚本中运行它来将其转换为更健康的utf-8?


使用iconv,例如:

1
iconv -f LATIN1 -t UTF-8 input.txt > output.txt

更多信息:

  • 您可以指定UTF-8//TRANSLIT,而不是普通的UTF-8。引用手册页:


    If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.

  • 对于iconv接受的完整编码列表,执行iconv -l

  • 上面的示例使用shell重定向。确保您没有使用在重定向时管理编码的shell——也就是说,不要为此使用PowerShell。

1
recode latin2..utf8 myfile.txt

这将用新版本覆盖myfile.txt。也可以使用不带文件名的recode作为管道。


GNU"libiconv"应该能够完成这项工作。