2011-03-13

Troubleshooting USB device development...

...or how I learned to stop worrying and love Windows' Code 10 ("This device cannot start.")

Let's face it, when it comes to USB device development, the USB stack on Windows is a lot more unforgiving than its Linux counterpart. Now, the problem of course is, Windows being Windows, it does its hardest at hiding what the technical details of a problem are, but instead will spew generic error codes that are only just a notch above useless.

Today's error: the dreaded "This device cannot start. (Code 10)" from the device manager (or CM_PROB_FAILED_START if you're using a WDK related app)  whenever it sees an USB device that it doesn't like. As Tim Roberts puts it here,
CM_PROB_FAILED_START means that your driver loaded, passed through DriverEntry, passed through AddDevice, and then failed during IRP_MN_START_DEVICE.
In layman's terms, this means that the issue occurs during the USB transfers that Windows issues with the device, after the driver has been setup and the device instance has been (re)created. Indeed, Windows does not simply stop communicating with the USB device once it's been instantiated internally; there are a few additional commands that are issued, such as retrieving the String Descriptors, as well setting the Configuration or setting the interface, if Alt Interfaces are provided. Officially of course, Windows does not offer a user API to control the latter, but it still needs to issue these control commands to the device, in order to place it in a known state. And if your device firmware is improperly setup (eg. stalls on any of these commands), Windows will fail the device altogether.

So now of course the question becomes, how can I tell if any of the later Control Transfers are failing?


LogMan and NetMon

To do that, you want an USB analyzer (or you can try the USB consortium's USB Command Verifier, however, despite asking for $2000 a pop for a VID, the USB IF are cheapskate enough not to provide an app that comes with signed Windows drivers, so you have to run a pretty intrusive EHCI driver in test mode). If you're on a 64 bit system, or a recent version of Windows, and you want a free solution, this can be problematic, as SnoopyPro doesn't seem to work that well there. However, in recent versions of Windows, you can get some decent snooping of the USB transfers through NetMon and the CodePlex USB Parsers. The whole process is detailed in this post.

Once you have these tools, run the following from an elevated command prompt to start a trace (NB: you can dump hub traffic as well, as per the Microsoft post, but if you're troubleshooting a standard USB device, you probably don't care much about that part):
Logman start Usbtrace -p Microsoft-Windows-USB-USBPORT -ets -nb 128 640 -bs 128 -o dump.etl
Then just plug in your device, so that you get the Code 10 / CM_PROB_FAILED_START error, and then stop the trace with:
Logman stop Usbtrace -ets


Analysing the trace

And now you have a trace in NetMon. Now what? Well, a couple of things.
First, you might want to isolate the Control Transfers that deal with your device only. If you're using a device with VID 0x1234, this can be done as follows (There's probably a way to simplify this - haven't found it yet...):
USBPort.USBPORT_ETW_EVENT_DEVICE_INITIALIZE.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_DISPATCH_URB_FUNCTION_CONTROL_TRANSFER.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_COMPLETE_URB_FUNCTION_CONTROL_TRANSFER_DATA.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_DISPATCH_URB_FUNCTION_CONTROL_TRANSFER_EX.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_COMPLETE_URB_FUNCTION_CONTROL_TRANSFER_EX_DATA.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_COMPLETE_INTERNAL_URB_FUNCTION_CONTROL_TRANSFER.fid_USBPORT_Device.idVendor == 0x1234 OR
USBPort.USBPORT_ETW_EVENT_ENDPOINT_OPEN.fid_USBPORT_Device.idVendor == 0x1234
That's better. But more importantly, you want to check for errors, which could be done like this for a Pipe Stall:

USBPort.USBPORT_ETW_EVENT_COMPLETE_INTERNAL_URB_FUNCTION_CONTROL_TRANSFER.ControlTransfer.Urb.fid_URB_Hdr_Status == 0xc0000004

Or by the much more generic error filter from Load Filters -> Standard Filters -> USB -> USB Error Events, which translates to the following:
(USBPort AND NetEvent.Header.Descriptor.Opcode == 34) OR
(USBHub AND NetEvent.Header.Descriptor.Opcode == 11) OR
(NetEvent.Header.Descriptor.Level == 0x2) OR
(USBHub AND NetEvent.Header.Descriptor.Id == 210) OR
(Property.USBUrbStatus != "success" && Property.USBUrbStatus != "")

If you're lucky, then you might find that your device has raised something like USBD_STATUS_STALL_PID during one of the post Device_Initialize Control transfers. If so, then you're halfway there: find the corresponding section of code in your firmware, fix it, and Windows should be a lot happier about your device...

1 comment:

  1. Thanks for the useful article!
    About filtering in netmon: I've found it useful to first locate USBPortPath:
    USBPort.USBPORT_ETW_EVENT_DEVICE_INFORMATION.fid_USBPORT_Device.idVendor == 0x1234 // and look at USB Port Path column (may need to add it first via 'choose columns')

    And then use it for filtering:
    Property.USBPortPath == "1,2,3,1"

    Hope this comment will help someone, someday...

    ReplyDelete