Close

Debugging the debug session

A project log for Debugging on a Teensy, the open source way

There comes a time when debugging your code requires more than Serial.print();

vedranVedran 04/05/2024 at 16:280 Comments

I tried several other approaches with breakpoints

It's likely something obvious I'm missing since I'm by no means knowledgeable in this area. So it's time to learn the basics of what's going on under the hood. I found a nice article to get me started on GDB, it's remote serial protocol (used to talk to a debug probe), and breakpoints in general: https://interrupt.memfault.com/blog/cortex-m-breakpoints


With what I learned there, I could print out GDB commands and conclude that installing a breakpoint is not what causes a problem. Rather, failure happens consistently after a breakpoint is hit and GDB is trying to read general registers

Following the printout:

  1. Code is paused (Ctrl-c)
  2. Temporary breakpoint is installed at in function loop() at line 23. ('tbreak ...')
  3. Request for resuming code execution ( '(gdb)c' )
  4. Request made to install a hardware breakpoint ($Z1) at address 0x47a
  5. Request made to continue execution ( '$c...' )
  6. MCU reports a breakpoint has been hit ('Packet received: T05')
  7. Request made to read registers( '$g...' )
  8. Request fails

Playing with the debug firmware - A breakthrough!

To further isolate the problem after a few days of hitting my head against the wall, I went on to poke at the blackmagic firmware. I could see above that the error always happens when requesting the registers after a breakpoint, during "$g" command. So I went in https://github.com/vedranMv/blackmagic/blob/kinetis_mk20/src/gdb_main.c#L117 file and tried to remove handling of that command from firmware in the way shown below:

/* execute gdb remote command stored in 'pbuf'. returns immediately, no busy waiting. */
int gdb_main_loop(target_controller_s *tc, char *pbuf, size_t pbuf_size, size_t size, bool in_syscall)
{
    bool single_step = false;

    /* GDB protocol main loop */
    switch (pbuf[0]) {
    /* Implementation of these is mandatory! */
    case 'g': { /* 'g': Read general registers */
        ERROR_IF_NO_TARGET();
        const size_t reg_size = target_regs_size(cur_target);
        /*if (reg_size) {
            uint8_t *gp_regs = alloca(reg_size);
            target_regs_read(cur_target, gp_regs);
            gdb_putpacket(hexify(pbuf, gp_regs, reg_size), reg_size * 2U);
        } else {
            gdb_putpacketz("00");
        }*/
        gdb_putpacketz("00");
        break;
    }

Flashing this on, the issue was no longer there. I could hit a breakpoint and resume from it without errors. The only problem now is that, the registers are no longer being updated.

Conclusion

After getting help from maintainers of blackmagic project, the issue turned out to be in GDB itself. My platformio project somehow ended up using a very old version of GDB that had a bug causing it to fail in special cases when reading registers from the MCU. Bug report in GDB's issue tracker was the final confirmation.

In short, the bug was in the handling of a response GDB receives from a debug probe when querying target registers. If the registers look such that first three bytes look like an error code ('Exx....'), GDB would falsely interpret this as a probe encountering an error and would simply terminate the session. In some of the screenshots above, there was no "remote failure" as GDB states, there was just an unlucky situation that the first register returned in the reply had an unfortunate value that would be converted to "E09...".

The fix was to simply upgrade GDB to a version where the bug is fixed, and since in platformio GDB is part of the compiler, that meant updating the compiler itself. To bump the compiler in platformIO, it's necessary to update the configuration with the following line:

platform_packages = platformio/toolchain-gccarmnoneeabi@1.90301.200702

Version can be any of the supported versions, as long as it's higher than 1.50401.190816.

Discussions