Pete's Blog: 2020

2020-12-16

UEFI Hexdump

If you're developing UEFI firmware content, sooner or later you're going to want to dump binary data using the debug facility.

And so, without further ado:

(...)

#include <Library/BaseLib.h>
#include <Library/PrintLib.h>

(...)

STATIC
VOID 
DumpBufferHex (
  VOID* Buf,
  UINTN Size
)
{
  UINT8* Buffer = (UINT8*)Buf;
  UINTN  i, j, k;
  char Line[80] = "";

  for (i = 0; i < Size; i += 16) {
    if (i != 0) {
      DEBUG ((DEBUG_INFO, "%a\n", Line));
    }
    Line[0] = 0;
    AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), "  %08x  ", i);
    for (j = 0, k = 0; k < 16; j++, k++) {
      if (i + j < Size) {
        AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), "%02x", Buffer[i + j]);
      } else {
        AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), "  ");
      }
      AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), " ");
    }
    AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), " ");
    for (j = 0, k = 0; k < 16; j++, k++) {
      if (i + j < Size) {
        if ((Buffer[i + j] < 32) || (Buffer[ i + j] > 126)) {
          AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), ".");
        } else {
          AsciiSPrint (&Line[AsciiStrLen (Line)], 80 - AsciiStrLen (Line), "%c", Buffer[i + j]);
        }
      }
    }
  }
  DEBUG ((DEBUG_INFO, "%a\n", Line));
}

2020-08-08

Updating XML files with PowerShell

Say you have the following file.xml:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <item name="Item 1" id="0" />
  <item name="Item 2" id="1001" />
  <item name="Item 3" id="0" />
  <item name="Item 4" id="1002" />
  <item name="Item 5" id="1005" />
  <item name="Item 6" id="0" />
</data>

And you want to replace all those "0" id attributes with incremental values.

If you have PowerShell, this can be accomplished pretty easily with the following commands:

$xml = New-Object xml
$xml.Load("$PWD\file.xml")
$i = 2001; foreach ($item in $xml.data.item) { if ($item.id -eq 0) { $item.id = [string]$i; $i++ } }
$xml.Save("$PWD\updated.xml")

Now your output (updated.xml) looks like:

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <item name="Item 1" id="2001" />
  <item name="Item 2" id="1001" />
  <item name="Item 3" id="2002" />
  <item name="Item 4" id="1002" />
  <item name="Item 5" id="1005" />
  <item name="Item 6" id="2003" />
</data>

Easy-peasy...

2020-07-08

(Ab)using Microsoft's symbol servers, for fun and profit

Since I find myself doing this on regular basis (Hail Ghidra!), and can never quite remember the commands.

Say you have a little Microsoft executable, such as the latest ARM64 version of usbxhci.sys, that you want to investigate using Ghidra.

Of course, one thing that can make the whole difference between hours of "Where the heck is the function call for the code I am after?" and a few of seconds of "Judging by its name, this call is the most likely candidate" is the availability of the .pdb debug symbols for the Windows executable you are analysing.

You may also know that, because of the huge corporate ecosystem they have where such information might be critical (as well as some government pressure to make it public), it so happens that Microsoft does make available a lot of the debug information that was generated during the compilation of Windows components. Now, since it can amount to a large volume of data (one can usually expect a .pdb to be 3 to 5 times larger than the resulting code) this debug information is not usually provided with Windows, unless you are running a Debug/Checked build.

But it can "easily" be retrieved from Microsoft's servers. Here's how.

First of all, you need to ensure that you have the Windows SDK or Windows Driver Kit installed. If you have Visual Studio 2019 (remember, the Community Edition of VS2019 is free) with the C++ development environment, these should already have been installed for you. But really it's up to you to sort that out and alter the paths below as needed.

With this prerequisite taken care of, you should find a commandline executable called symchk.exe somewhere in C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\. This is the utility that can connect to the Microsoft's servers to fetch the symbol files, i.e. the compilation .pdb's that Microsoft has made public.

So, let's say we have copied our ARM64 xHCI driver (USBXHCI.SYS - Why Microsoft suddenly decided to YELL ITS NAME is unknown) to some directory. All you need to do to retrieve its associated .pdb then is issue the command:

"C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\symchk.exe" /s srv*https://msdl.microsoft.com/download/symbols /ocx .\ USBXHCI.SYS

The /s flag indicates where the symbols should be retrieved from (here the Microsoft's remote server) and the /ocx flag, followed by a folder, indicates where the .pdb should be copied (here, the same directory as the one where we have our driver).

If everything goes well, the output of the command should be:

SYMCHK: FAILED files = 0
SYMCHK: PASSED + IGNORED files = 1

with the important part being that the number of PASSED files is not zero, and you should find a newly created usbxhci.pdb in your directory. Neat!

"Hello, my name is Mr Snrub"

So, what do you do with that?

Well, I did mention Ghidra, and as a comprehensive disassembly/decompiler utility, Ghidra does of course have the ability to work with debug symbols if they happen to be available (sadly, it doesn't seem to have the ability to look them up automatically like IDA, or if it does, I haven't found where this can be configured), which helps turn an obtuse FUN_1c003ac90() function name, into a much more indicative XilRegister_ReadUlong64()...

For instance, let's say you happen to have been made aware that the reason why you currently can't use the rear USB-A ports for Windows 10 on the Raspberry Pi 4 is because Broadcom/VIA (most likely Broadcom, because they've already done everyone a number with implementing a DMA controller that chokes past 3 GB on the Bcm2711) have screwed up 64-bit PCIe accesses, and they end up returning garbage in the high 32-bit DWORD unless you only ever attempt to read 64-bit QWORDs as two sequential DWORDs instead of a single QWORD.

As a result of this, you may be exceedingly interested to find out if there exists something in the function calls used by Microsoft's usbxhci.sys driver, that can set 64-bit xHCI register accesses to be enacted as two 32-bit ones.

Obviously then, if, after using the .pdb we've just retrieved above, Ghidra helpfully tells you that there does exist a function call at address 1c003ac90 called XilRegister_ReadUlong64, you are going to be exceedingly interested in having a look at that call:

undefined8 XilRegister_ReadUlong64(longlong param_1,undefined8 *param_2)
{
  undefined8 local_30 [6];
  
  local_30[0] = 0;
  if (*(char *)(*(longlong *)(param_1 + 8) + 0x219) == '\0') {
    DataSynchronizationBarrier(3,3);
    if ((*(ulonglong *)(*(longlong *)(param_1 + 8) + 0x150) & 1) == 0) {
      // 64-bit qword access
      local_30[0] = *param_2;
    } else {
      DataSynchronizationBarrier(3,3);
      // 2x32-bit dword access
      local_30[0] = CONCAT44(*(undefined4 *)((longlong)param_2 + 4),*(undefined4 *)param_2);
    }
  } else {
    Register_ReadSecureMmio(param_1,param_2,3,1,local_30);
  }
  return local_30[0];
}

NB: The comments were not added by Ghidra. Ghidra may be good at what it does, but it's not that good...

Guess what? It so happens that there exists an attribute somewhere, that Microsoft uses the bit 0 of, to decide whether 64-bit xHCI registers should be read using two 32-bit access. Awesome, this looks exactly like what we're after.

The corresponding disassembly also tells us that this if condition is ultimately encoded as a tbnz ARM64 instruction. So if we revert that logic, by using a tbz instead of tbnz, we should be able to force the failing 64-bit reads to be enacted as 2x32-bit, which may fix our xHCI driver woes...

Let's do just that then, by editing USBXHCI.SYS and changing the EA 00 00 37 sequence at address 0x03a0d0 to EA 00 00 36 (tbnz → tbz) and, for good measure, do the same for XilRegister_WriteUlong64 at address 0x005b34, by also changing 0A 01 00 37 into 0A 01 00 36 to reverse the logic. "Yes that'll do".

"I like the way Snrub thinks!"

Well, we may have patched our driver and tried to fool the system by reversing some stuff, but, as the Simpsons have long attested, it's not going to do much unless you have a trusted sidekick to help you out.

Obviously, since we broke the signature of that driver the minute we changed even a single bit, we're going to have to tell Windows to disable signature enforcement for the boot on our target ARM64 platform, which can be done by setting nointegritychecks on in the BCD. And while we're at it we may want to enable test signing as well. Now, most of the commands you'll see are for the local BCD, but that's not what we are after here, since we want to modify a USB installed version of Windows, where, in our case, the BCD is located at S:\EFI\Microsoft\Boot\BCD. So the trick to achieving that (from a command prompt running elevated) is:

bcdedit /store S:\EFI\Microsoft\Boot\BCD /set {default} testsigning on
bcdedit /store S:\EFI\Microsoft\Boot\BCD /set {default} nointegritychecks on

However, if you only do that and (after taking ownership and granting yourself full permissions so that you can replace the existing driver) copy the altered USBXHCI.SYS to Windows\System32\drivers\ you will still be greeted by an obnoxious

Recovery

Your PC/Device needs to be repaired

The operating system couldn't be loaded because a critical system driver is missing or contains errors.

File: \Windows\System32\drivers\USBXHCI.SYS
Error code: 0xc0000221

Oh noes!

The problem, which is what generates the 0xc0000221 (STATUS_IMAGE_CHECKSUM_MISMATCH) error code, is that the optional PE checksum field, used by Windows to validate critical boot executables, has not been updated after we altered USBXHCI.SYS. Therefore checksum validation fails, and this is precisely what the Windows boot process is complaining about.

Fixing this is very simple: Just download PEChecksum64.exe (e.g. from here) and issue the command:

D:\Dis\>PEChecksum64.exe USBXHCI.SYS
USBXHCI.SYS: Checksum updated from 0x0008D39B to 0x0008D19B

For good measure, you will also need to self-sign that driver, so that you can avoid Windows booting into recovery mode with an obnoxious 0xc000000f from winload.exe (though you can still proceed to full boot from there).

Now we finally have all the pieces we need.

For instance, we can replace USBXHCI.SYS on a fast USB 3.0 flash drive containing an Raspberry Pi Windows 10 ARM64 installation created using WOR (and if you happen to have the latest EEPROM flashed as well as a version of the Raspberry Pi 4 UEFI firmware that includes this patch, you can actually boot the whole thing straight from USB), and, while we are at it, remove the 1 GB RAM limit that the Pi 4 had to have when booting from USB-C port (since we're not going to use that USB controller), by issuing, from an elevated prompt:

bcdedit /store Y:\EFI\Microsoft\Boot\BCD /deletevalue {default} truncatememory

Do all of the above and, who knows, you might actually end up with a usable Windows 10 ARM64 OS, running from one of the rear panel's fast USB 3.0 ports with a whooping 3 GB of RAM, on your Raspberry Pi 4.

Now, isn't that something?

But this is just a post about using Microsoft's symbol servers.

It's not a post about running full blown Windows 10 on the Raspberry Pi 4, right?

Addendum: In case you don't want to have to go through the taking of ownership, patching, updating of the PE checksum and digitally re-signing of the file yourself, may I also interest you in winpatch?

2020-06-23

Et tu, Microsoft

It's a beautiful Saturday afternoon.

Everything is going as peachy as could be, with the satisfaction of having released a new version of your software, just a couple days ago, that wasn't short lived due to the all too common subsequent realisation that you managed to introduce a massive "oops", such as including completely wrong drivers for a specific architecture (courtesy of Rufus 3.10) or having your ext formatting feature break when a partition is larger than 4 GB (courtesy of Rufus 3.8)... Sometimes I have to wonder if Rufus isn't suffering from the same curse as the original Star Trek movie releases (albeit inverted in our case).

Thus, basking in the contentment of a job well done, you fire up your trusty Windows 10, which you upgraded to the 2004 release just a couple weeks ago (along with that Storage Space array you use), and go on your merry way, doing inconsequential Windows stuff, such as deciding to rename one folder.

And that's when all hell breaks lose...

Suddenly, your file explorer freezes, every disk access becomes unresponsive and you are seeing your most important disk (the one that contains, among other things, all the ISOs you accumulated for testing with Rufus, and which you made sure to set up with redundancy using Storage Spaces, along with an ReFS file system where file integrity had been enabled) undergoing constant access, with no application in sight seemingly performing those...

Oh and rebooting (provided you are patient enough to wait the 10 minutes it takes to actually reboot) doesn't help in the slightest. If anything, it appears to make the situation worse as Windows now takes forever to boot, with the constant disk access issue of your Storage Space drive still in full swing.

Yet the Storage Spaces control panel reports that all of the underlying HDDs are fine, a short SMART test on those also reports no issue and even a desperate attempt to try to identify what specific drive might be the source of the trouble, by trying each combination of 3 our 4 HDDs, yields nothing. If nothing else, it would confirm the idea that Microsoft did a relatively solid job with Storage Spaces, at least in terms of hardware gotchas, considering that every other parity solution I know of, such as the often decried Intel RAID, would scream bloody murder if you removed another drive before it got through the super time consuming rebuilding of the whole array (which is the precise reason I swore off using Intel RAID and moved to Storage Spaces).

An ReFS issue then? If that's the case, talk of a misnomer for something that's supposed to be resilient...

Indeed, the Event Viewer shows a flurry of ReFS errors, ultimately culminating in this ominous message, that gets repeated many times as the system attempts to access the drive, as you end up finding that your drive has been "remounted" as RAW:

Volume D: is formatted as ReFS but ReFS is unable to mount it;
ReFS encountered status The volume repair was not successful...

Someone at Microsoft may want to look up the definition of resiliency...

Ugh, that's the second ReFS drive I lose in about a month (earlier was an SSD that hosted all my VMs, and that Windows mysteriously overwrote as a Microsoft Reserved Partition)! If that's indicative of a trend, I think that Microsoft might want to weather-test their data oriented solutions a little better. Things used to be rock-stable, but I can't say I've been impressed by Windows 10's prowess on the matter lately...

And yes, I do have some backups of course (well, I didn't for those VMs, but that was data I could afford to lose) but they are spread all over the place on account that I am not made of money, dammit!

See, the whole point of entrusting my data to a 10 TB parity array made of 4x4 TB HDDs was that I could reuse drives that I (more or less) had lying around, and you'd better believe those were the cheapest 4 TB drives I'd been able to lay my hands on. In other words, Seagate, since HDD manufacturers have long decided, or, should I say, colluded, that they should stop trying to compete on price, as further evidenced by the fact that I still paid less for an 8 TB HDD, two frigging years ago, than the cheapest price I could find for the exact same model today.

"Storage is getting cheaper", my ass!

Oh and since we're talking about Seagate and reliability, I will also state that, in about 20 years of using almost exclusively Seagate drives, on account that they are constantly on the cheaper side (though Seagate and other manufacturers may want to explain why on earth it is cheaper to buy a USB HDD enclosure, with cable, PSU and SATA ↔ USB converter, than the same bare model of HDD), I have yet to experience a single drive failure for any Seagates I use in my active RAID arrays.

So when people say Seagate is too unreliable, I beg to anecdotally differ since, for the price, Seagate's more than reliable enough. I mean, between paying exactly 0 € for 10 TB with parity vs. between 500 to 700 € (current price, at best) for a parity or mirrored NAS array, there's really no contest. I don't mind that a lot of people appear to have semi-bottomless pockets, and can't see themselves go with less than a mirroring solution with brand new NAS drives. But that's no reason to look down on people who do use parity along with cheap non NAS drives, because price is far from being an inconsequential factor when it comes to the preservation of their data...

And it's even more true here as the issue at hand has nothing to do with using cheap hardware and that everyone knows that a parity or mirroring solution is worth nothing if you don't also combine it with offline backups, which means even more disks, preferably of large capacity, and therefore even more budget to provision...

All this to say that there's a good reason why I don't have a single 8 or 10 TB HDD lying around, with all my backups for the array that went offline, and why, as much as I wish otherwise, there are going to be gaps in the data I restore... So yeah, count me less than thrilled with a data loss that wasn't incurred by a hardware failure or my own carelessness (the only ever two valid causes for losing data).

Alas, with the Windows 10 2004 feature update, it appears that the good folks at Microsoft decided that there just weren't enough ways in which people could kill their data. So they created a brand new one.

Enters KB4570719.

The worst part of it is that I've seen reports indicating that this, as well as other corollary issues, was pointed out to Microsoft by Windows Insiders as far back as September 2019. So why on earth was something that should instantly have been flagged as a super critical data loss issue, included in the May 2020 update?

Oh and of course, at the time of this post, i.e. about one month after the data-destructive Windows update was released, there's still no solution in sight... though, from what I have found, non extensible parity Storage Spaces may be okay to use, as long as these were created using PowerShell commands to make them non dynamically extensible, rather than through the UI which forces extensible.

If this post seems like a rant, it's because it mostly is, considering that I am less than thrilled at having had to waste one week trying to salvage what I could of my data. But since we need to conclude this little story, let me impart the following two truths upon you:

1. EVERYTHING, and I do mean EVERYTHING is actively trying to murder your data.
Do not trust the hardware. Do not trust yourself. And especially, do not trust the Operating System not to lounge a sharp blade straight through your data's toga, during the Ides of June.

2. (Since this is something I am all too commonly facing with Rufus' user reports) It doesn't matter how large and well established a software company is compared to an Independent Software Developer; the OS can still very much be the one and only reason why third party software appears to be failing, and you should always be careful never to consider the OS above suspicion. There is no more truth to "surely a Microsoft (or an Apple or a Google for that matter) would not to ship an OS that contains glaring bugs" today as there has been in the past, or as there will be in the future.
The OS can and does fail spectacularly at times (and I have plenty more examples besides this one, that I could provide). So don't fail to account for that possibility.

2020-05-15

Why is my Samba connection failing?

Or how nice it is to have a problem that has long eluded you finally explained.

Part 1: The horror

You see, I've been using a FriendlyArm/FriendlyElec RK3399-based NanoPC-T4 to run Linux services, such as a staging web server for Rufus, network print host, various other things as well as a Samba File Server...

However, this Samba functionality seemed to be plagued like there was no tomorrow: Almost every time I tried to fetch a large file from it, Windows would freeze during transfer, with no recovery and not even the possibility of cancelling unless the Samba service was restarted manually on the server.

But what was more vexing is that these problems with Samba did not manifest themselves until I switched from using an old Lubuntu distribution, that was provided by the manufacturer of that device, to a more up to date Armbian. With Lubuntu, Samba seemed rock-solid, but with Armbian, it was hopeless.

This became so infuriating that I had to giveup on using Samba on that machine altogether and, considering that things usually seemed to be okay-ish after the service had restarted, I dismissed it as a pure Samba/arch64 bug, that newer versions of Samba or Debian had triggered, and that would eventually get fixed. But of course, that long awaited fix never seemed to manifest itself and I had better things to do than invest time I didn't have trying to troubleshoot a functionality that wasn't that critical to my workflow.

Besides, the Samba logs were all but useless. Nothing in there seemed to provide any indication that Samba was even remotely unhappy. And of course, you can forget about Windows giving you any clue about why the heck your Samba file transfers are freezing...

Part 2: The light at the end of the tunnel

Recently however, in the course of the Raspberry Pi 4 UEFI firmware experiments, it turns out that I was using that same server to test UEFI HTTP boot of a large (900 MB) ISO, that was being served from the Apache server running on that NanoPC machine, and had no joy with getting the full transfer complete either. Except, there, it wasn't freezing. It just seemed to produce a bunch of TcpInput: received a checksum error packet before giving up on the transfer altogether...

URI: http://10.0.0.7/~efi/ubuntu.iso
File Size: 916357120 Bytes
Downloading...1%
TcpInput: received a checksum error packet
TcpInput: Discard a packet
TcpInput: received a checksum error packet
TcpInput: Discard a packet
TcpInput: received a checksum error packet
TcpInput: Discard a packet
TcpInput: received a checksum error packet
TcpInput: Discard a packet
TcpInput: received a checksum error packet
TcpInput: Discard a packet
TcpInput: received a checksum error packet
TcpInput: Discard a packet
HttpTcpReceiveNotifyDpc: Aborted!

Error: Server response timeout.

Yet, serving the same content from the native python3 HTTP server (python3 -m http.server 80, which is a super convenient command to know as it acts as an HTTP server and serves any content from the current directory through the specified port) appeared to be okay, albeit with the occasional checksum errors. This is suddenly starting to look like a lot of compounded network errors... Could this be related to that Samba issue?

Now, the first thing you do when you get reports of TCP checksum errors, is try a different cable, a different switch and so on, to make sure that this is not a pure hardware problem. But I had of course tried that during the process of trying to troubleshoot the failing Samba server, and, once again, the results of switching equipment and cabling around were all negative.

But at least a bunch of checksum errors does give you something to start to work with.

For one thing, you can monitor these errors with tcpdump (tcpdump -i eth0 -vvv tcp | grep incorrect) and, more importantly, you may find some very relevant articles that point you to the very root of the problem.

Long story short, if tcpdump -i eth0 -vvv tcp | grep incorrect produces loads of checksum errors on the platform you serve content from, you may want to look into disabling offloading from the network adapter with something like:

ethtool -K eth0 rx off tx off

Or you may continue to hope that the makers of your distro will take action, but that might just turn out to be wishful thinking...

Pete's Blog