Here's the deal: You copied a bunch of files, and somewhere along the way, one of the applications screwed up and did not produce actual Unicode file names but instead misinterpreted the UTF-8 sequences as CodePage 1252, resulting in something dreadful like this:
And now you'd like to have a quick way to convert the 1252-interpreted UTF-8 to actual UTF-8. So you look around thinking that, surely, someone must have done something to sort this annoyance, but the only thing you can find is a UNIX perl script called
convmv
, which isn't really helpful. Why hasn't anyone crafted a quick PowerShell script to do the same on Windows already?Well, it turns out that, because of PowerShell's limitations, and Windows' getting in the way of enacting a proper conversion of 1252 to UTF-8, producing such a script is actually a minor pain in the ass. Still, now, someone has produced such a thing:
#region Parameters param( # (Optional) The directory [string]$Dir = "." ) #endregion # You'll need to have your console set to CP 65001 AND use NSimSun as your # font if you want any hope of displaying CJK characters in your console... [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 $files = Get-ChildItem -File -Path $Dir -Recurse -Name foreach ($f in $files) { $bytes = [System.Text.Encoding]::GetEncoding(1252).GetBytes($f) $nf = [io.path]::GetFileName([System.Text.Encoding]::UTF8.GetString($bytes)) Write-Host "$f" → "$nf" # [$hex] # Must use -LiteralPath else files that contain '[' or ']' in their name produce an error Rename-Item -LiteralPath "$f" -NewName "$nf" } # Produce a "Press any key" message when ran with right click $auxRegKey='\SOFTWARE\Classes\Microsoft.PowerShellScript.1\Shell\0\Command' $auxRegVal=(get-itemproperty -literalpath HKLM:$auxRegKey).'(default)' $auxRegCmd=$auxRegVal.Split(' ',3)[2].Replace('%1', $MyInvocation.MyCommand.Definition) if ("`"$($myinvocation.Line)`"" -eq $auxRegCmd) { Write-Host "`nPress any key to exit..." $null = $Host.UI.RawUI.ReadKey('NoEcho,IncludeKeyDown') }
If you save this script to something like
utf8_rename.ps1
in the top directory where you have your misconverted files, and then use Run with PowerShell in the explorer's context menu, you should then see some output like this (provided your console is set to codepage 65001, a.k.a. UTF-8 and that you select a font that actually supports CJK characters, such as NSimSun (Microsoft will really have to explain how they have no trouble displaying CJK with NSimSun but still can't seem/want to do it with Lucida Console):Eventually, your file names should have been converted to their expected value, and all will be well:
That is, until someone who thinks it's okay to not properly support UTF-8 absolutely EVERYWHERE (Hey Microsoft, how about some UTF-8 Win32 APIs already?) screws up and forces people to manually unscrew their codepage handling yet again...