Using PowerShell to write a file in UTF-8 without the BOM
力的
1 2 | $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding"UTF8" $MyPath |
我怎么写的文件在使用utf - 8的PowerShell的好用吗?
使用.NET的
1 2 3 | $MyFile = Get-Content $MyPath $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False [System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding) |
到目前为止,正确的方法是使用@roman kuzmin在对@m.dudley的评论中推荐的解决方案。回答:
1 | [IO.File]::WriteAllLines($filename, $content) |
(我还通过去掉不必要的
我想这不是UTF,但我发现了一个非常简单的解决方案,似乎可以工作…
1 | Get-Content path/to/file.ext | out-file -encoding ASCII targetFile.ext |
对于我来说,无论源格式如何,这都会导致没有BOM文件的UTF-8。
注意:此答案适用于Windows PowerShell;相反,在跨平台PowerShell核心版本中,不带BOM的UTF-8是默认编码。
为了补充达德利先生自己简单而实用的回答(以及福雷沃更简洁的重新表述):
为了方便起见,这里有一个高级功能
- 您可以像管道中的
Out-File 一样使用它。 - 非字符串的输入对象的格式与将其发送到控制台时的格式相同,就像使用
Out-File 一样。
例子:
1 | (Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath |
注意
关于内存使用的说明:
- M.Dudley自己的答案要求首先在内存中构建整个文件内容,这对于大文件来说可能有问题。
- 下面的函数对此只做了轻微的改进:所有输入对象仍然首先被缓冲,但是它们的字符串表示随后被生成并逐个写入输出文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | <# .SYNOPSIS Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark). .DESCRIPTION Mimics the most important aspects of Out-File: * Input objects are sent to Out-String first. * -Append allows you to append to an existing file, -NoClobber prevents overwriting of an existing file. * -Width allows you to specify the line width for the text representations of input objects that aren't strings. However, it is not a complete implementation of all Out-String parameters: * Only a literal output path is supported, and only as a parameter. * -Force is not supported. Caveat: *All* pipeline input is buffered before writing output starts, but the string representations are generated and written to the target file one by one. .NOTES The raison d'être for this advanced function is that, as of PowerShell v5, Out-File still lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8 invariably prepends a BOM. #> function Out-FileUtf8NoBom { [CmdletBinding()] param( [Parameter(Mandatory, Position=0)] [string] $LiteralPath, [switch] $Append, [switch] $NoClobber, [AllowNull()] [int] $Width, [Parameter(ValueFromPipeline)] $InputObject ) #requires -version 3 # Make sure that the .NET framework sees the same working dir. as PS # and resolve the input path to a full path. [System.IO.Directory]::SetCurrentDirectory($PWD) # Caveat: .NET Core doesn't support [Environment]::CurrentDirectory $LiteralPath = [IO.Path]::GetFullPath($LiteralPath) # If -NoClobber was specified, throw an exception if the target file already # exists. if ($NoClobber -and (Test-Path $LiteralPath)) { Throw [IO.IOException]"The file '$LiteralPath' already exists." } # Create a StreamWriter object. # Note that we take advantage of the fact that the StreamWriter class by default: # - uses UTF-8 encoding # - without a BOM. $sw = New-Object IO.StreamWriter $LiteralPath, $Append $htOutStringArgs = @{} if ($Width) { $htOutStringArgs += @{ Width = $Width } } # Note: By not using begin / process / end blocks, we're effectively running # in the end block, which means that all pipeline input has already # been collected in automatic variable $Input. # We must use this approach, because using | Out-String individually # in each iteration of a process block would format each input object # with an indvidual header. try { $Input | Out-String -Stream @htOutStringArgs | % { $sw.WriteLine($_) } } finally { $sw.Dispose() } } |
当使用
1 2 3 4 5 | # This variable can be reused $utf8 = New-Object System.Text.UTF8Encoding $false $MyFile = Get-Content $MyPath -Raw Set-Content -Value $utf8.GetBytes($MyFile) -Encoding Byte -Path $MyPath |
使用
从版本6开始,PowerShell支持对set content和out file使用
所以在上面的例子中,应该是这样的:
1 | $MyFile | Out-File -Encoding UTF8NoBOM $MyPath |
此脚本将把directory1中的所有.txt文件转换为不带bom的utf-8并输出到directory2
1 2 3 4 5 | foreach ($i in ls -name DIRECTORY1\*.txt) { $file_content = Get-Content"DIRECTORY1\$i"; [System.IO.File]::WriteAllLines("DIRECTORY2\$i", $file_content); } |
如果要使用
1 2 3 | $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile [System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding) |
如果要使用
1 2 3 | $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp [System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding) |
或者,您可以将
1 2 | $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path"/absolute/path/to/foobar.csv" |
请参见:如何将convertto csv的结果写入不带bom的utf-8文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | [System.IO.FileInfo] $file = Get-Item -Path $FilePath $sequenceBOM = New-Object System.Byte[] 3 $reader = $file.OpenRead() $bytesRead = $reader.Read($sequenceBOM, 0, 3) $reader.Dispose() #A UTF-8+BOM string will start with the three following bytes. Hex: 0xEF0xBB0xBF, Decimal: 239 187 191 if ($bytesRead -eq 3 -and $sequenceBOM[0] -eq 239 -and $sequenceBOM[1] -eq 187 -and $sequenceBOM[2] -eq 191) { $utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) [System.IO.File]::WriteAllLines($FilePath, (Get-Content $FilePath), $utf8NoBomEncoding) Write-Host"Remove UTF-8 BOM successfully" } Else { Write-Warning"Not UTF-8 BOM file" } |
SOURCE如何使用PowerShell从文件中删除utf8字节顺序标记(bom)
无论出于什么原因,
1 2 | $bytes = gc -Encoding byte BOMthetorpedoes.txt [IO.File]::WriteAllBytes("$(pwd)\BOMthetorpedoes.txt", $bytes[3..($bytes.length-1)]) |
我必须将文件路径设置为绝对路径才能使其工作。否则它会将文件写入我的桌面。另外,我认为只有当您知道您的BOM是3个字节时,这才有效。我不知道期望给定的基于编码的BOM格式/长度有多可靠。
另外,如文中所述,这可能仅在您的文件适合PowerShell数组时有效,该数组的长度限制似乎低于我的计算机上的
按扩展名将多个文件更改为不带BOM的UTF-8:
1 2 3 4 5 | $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False) foreach($i in ls -recurse -filter"*.java") { $MyFile = Get-Content $i.fullname [System.IO.File]::WriteAllLines($i.fullname, $MyFile, $Utf8NoBomEncoding) } |
我使用的一种技术是使用out-file cmdlet将输出重定向到一个ASCII文件。
例如,我经常运行创建另一个要在Oracle中执行的SQL脚本的SQL脚本。通过简单的重定向(">"),输出将采用UTF-16格式,这是sqlplus无法识别的。要解决此问题:
1 2 | sqlplus -s / as sysdba"@create_sql_script.sql" | Out-File -FilePath new_script.sql -Encoding ASCII -Force |
然后,可以通过另一个sqlplus会话执行生成的脚本,而不必担心任何Unicode问题:
1 2 | sqlplus / as sysdba"@new_script.sql" | tee new_script.log |
有同样的问题。这对我来说是个骗局:
1 | $MyFile | Out-File -Encoding Oem $MyPath |
用Visual Studio代码或记事本++打开文件时,显示为
可以在下面使用以获取不带BOM的UTF8
1 | $MyFile | Out-File -Encoding ASCII |
这个适用于我(使用"默认"而不是"utf8"):
1 2 | $MyFile = Get-Content $MyPath $MyFile | Out-File -Encoding"Default" $MyPath |
结果是没有BOM的ASCII。