Charset conversion from XXX to utf-8, command line
我有一堆用ISO-8851-2编码的文本文件(有一些波兰字符)。有没有一个Linux/Mac的命令行工具,我可以从shell脚本中运行它来将其转换为更健康的utf-8?
使用
1 | iconv -f LATIN1 -t UTF-8 input.txt > output.txt |
更多信息:
您可以指定
UTF-8//TRANSLIT ,而不是普通的UTF-8 。引用手册页:
If the string
//TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.对于
iconv 接受的完整编码列表,执行iconv -l 。- 上面的示例使用shell重定向。确保您没有使用在重定向时管理编码的shell——也就是说,不要为此使用PowerShell。
1 | recode latin2..utf8 myfile.txt |
这将用新版本覆盖
GNU"libiconv"应该能够完成这项工作。