关于c＃：使用.NET如何将包含Latin-1重音字符的ISO 8859-1编码文本文件转换为UTF-8

Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

我正在发送以ISO 88591-1格式保存的文本文件，其中包含来自拉丁语-1范围的重音字符(以及普通的ASCII A-Z等)。如何使用C将这些文件转换为UTF-8，以便ISO 8859-1中的单字节重音字符成为有效的UTF-8字符？

我尝试使用带ascii编码的streamreader，然后通过实例化编码ascii和编码utf8，然后使用Encoding.Convert(ascii, utf8, ascii.GetBytes( asciiString) )&mdash，将ascii字符串转换为utf-8，但重音字符被呈现为问号。

我错过了哪一步？

相关讨论

你需要得到正确的Encoding对象。ascii和它的名字一样：ascii，这意味着它只支持7位的ascii字符。如果您要做的是转换文件，那么这可能比直接处理字节数组更容易。

1
2
3
4
5
6
7
8
9

using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName,
Encoding.GetEncoding("iso-8859-1")))
{
using (System.IO.StreamWriter writer = new System.IO.StreamWriter(
outFileName, Encoding.UTF8))
{
writer.Write(reader.ReadToEnd());
}
}

但是，如果您想自己拥有字节数组，那么使用Encoding.Convert就足够简单了。

1 2	byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"), Encoding.UTF8, data);

但是，这里需要注意的是，如果您想沿着这条路走下去，那么您不应该为您的文件IO使用基于编码的字符串阅读器，如StreamReader。FileStream更适合，因为它将读取文件的实际字节。

为了充分探讨这一问题，类似这样的做法会奏效：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

using (System.IO.FileStream input = new System.IO.FileStream(fileName,
System.IO.FileMode.Open,
System.IO.FileAccess.Read))
{
byte[] buffer = new byte[input.Length];

int readLength = 0;

while (readLength < buffer.Length)
readLength += input.Read(buffer, readLength, buffer.Length - readLength);

byte[] converted = Encoding.Convert(Encoding.GetEncoding("iso-8859-1"),
Encoding.UTF8, buffer);

using (System.IO.FileStream output = new System.IO.FileStream(outFileName,
System.IO.FileMode.Create,
System.IO.FileAccess.Write))
{
output.Write(converted, 0, converted.Length);
}
}

在本例中，buffer变量作为byte[]填充文件中的实际数据，因此不进行转换。Encoding.Convert指定源和目标编码，然后将转换后的字节存储在名为…converted的变量中。然后直接将其写入输出文件。

如我所说，使用StreamReader和StreamWriter的第一个选项会简单得多，如果这就是您所做的全部工作，但是后一个例子应该给您更多关于实际情况的提示。