We have been using EM Microelectronic's EM7180 motion co-processor for absolute orientation estimation with some success. The EM7180 embeds a 10 MHz ARC processor with single-precision floating point unit (FPU) optimized for fast fusion calculations using fusion algorithms developed by PNI Corporation. The EM7180 off-loads the management of the sensors and the computationally-intensive fusion calculations from the host MCU so that accurate absolute orientation estimation in the form of quaternions or Euler angles can be read from the EM7180 by the host via simple I2C register reads. This means even a poky 8 MHz Arduino Pro Mini can obtain <2-degree accurate heading data when using the EM7180 co-processor and a suitably accurate sensor suite like the MPU9250 or LSM6DSM+LIS2MDL.

This may seem trivial when low-cost, high-performance MCUs like the STM32L4, nRF52832, ESP32 and others abound. And when computationally-efficient, open-source fusion methods have been available for years making it easy for anyone to get quaternions (see here and here and here and here) with these host MCUs.

However, accurate sensor calibration and fusion methods are not trivial to implement, so the demonstrated performance (< 2 degree rms heading accuracy) of the EM7180 is quite attractive. Also, the idea of a co-processor (commonplace in SoC computers in the form of sensor hubs and GPUs, for example) is to off-load specialized sensor management and computationally-intensive data processing tasks from the host. This allows the host's full attention to be placed elsewhere, and/or allows a power-gulping host to sleep through sensor management tasks thereby conserving power. And for hosts with embedded connectivity like the nRF52 (BLE) and ESP32 (BLE and wifi), the radio stack usually commands top priority so off-loading computationally-intensive tasks when possible minimizes collisions.

In addition to sparing the host and obtaining superb heading accuracy, the advantages of using the EM7180 as motion co-processor include small size (1.6 mm x 1.6 mm WLCSP-16), auto gyro and magnetometer calibration, simple I2C serial output, and ultra-low-power usage. Quaternions can be updated at the rate of the gyro (up to 400 Hz guaranteed), and there is some flexibility in the form of generic user registers, fusion tuning parameters, and the ability to make use of RAM patches to allow some customization of the fusion algorithms (for example, we have added fusion of the accelerometer and barometer to provide a drift-corrected altitude estimation). There is a warm start capability that allows the EM7180 to start with the last session's calibration parameters upon subsequent power up.

There are some disadvantages to using the EM7180, of course. These include the fixed, "black-box" nature of PNI Corp.'s algorithms stored in ROM, the need to load the sensor-specific firmware into RAM on each power up, the very small amount of free RAM that limits customization, behaviors of the fusion algorithm (especially the dynamic magnetometer calibration) that cannot be adjusted or turned off. While the EM7180 is sensor agnostic, meaning it can use the input of almost any I2C accelerometer, gyro, and magnetometer to produce fused quaternions, the drivers for the sensors have to be created and compiled using a deprecated compiler. The ~24 kBytes of compiled firmware have to be stored on an EEPROM for loading into the EM7180 RAM on each power up, or loaded from the host. Lastly, the EM7180 is designed to manage I2C sensors so that devices with SPI or UART serial interfaces cannot be used directly and require a translator.

With the announcement of MAXIM Integrated's DARWIN family of MCUs, especially the MAX32660, we have an opportunity to design a motion co-processor that offers all of the advantages of the EM7180 with few, if any, of the disadvantages.

The MAX32660 comes in a 1.6 mm x 1.6 mm MAX32660GWE+ WLCSP-16 variant just like the EM7180. However, using inexpensive fab design rules (like those of OSH Park) we cannot make use of this 0.35-mm-pitch package, which would normally require via-in-pad methods costing $100 per square inch or more (However, with the advent of zglue, this might change*). Fortunately, on the MAX32660 roadmap is a 1.6 mm x 1.6 mm MAX32660GWEBL TQFN-16 package that OSH Park can manage. In the mean time, we are making use of the MAX32660GTG+ 3 mm x 3 mm 0.35-mm-pitch TQFN-24 package for development and prototyping. This is still just one third the size of, say, the 5 mm x 5 mm STM32L432 MCU that could serve as well but would not let us maintain the small pcb area solution we desire. The TQFN-16 package will use ~1/10th of the pcb area of the L432!

The MAX32660 has 256 kBytes of flash, 96 kBytes of SRAM, runs at 96 MHz and uses a Cortex M4F architecture, meaning it has four channels of fast DMA, two hardware I2C busses (one for host and one for slave sensors) that support 3.4 MHz bus speeds, 16 kB of instruction cacheing, and a single-precision floating point unit. We would expect to be able to obtain quaternion updates at the maximum rate of the gyro (6664 Hz, not that this is necessary!) with this kind of horsepower.

There is plenty of memory to hold firmware which resides in flash and doesn't need to be loaded into the MAX32660 at each power up. Programming is via SWD port using standard tools like Eclipse, MBED, or GCC. We can hold warm start parameters in emulated 2KByte EEPROM to reduce the need for calibration on each use.

We will use our own computationally-efficient fusion algorithms which we are currently developing on a STM32L4 host processor using EM7180 scaled sensor data. The idea is to find the calibration and fusion methods that produce the best (most consistently accurate) results and make use of them in the MAX32660 firmware. In addition to matching or exceeding the ~1 degree rms heading accuracy we currently obtain using the EM7180 + ST sensors, we expect to achieve faster fusion rates when needed at lower overall power usage, without the need of an external EEPROM; the size (and price) of our popular Ultimate Sensor Fusion Solution can stay the same; i.e., tiny.

Using the MAX32660 should allow us to match the size, performance (heading accuracy), and power usage advantages of the EM7180-based solutions while avoiding the limited memory, obsolete compiler, and non-transparency disadvantages. Furthermore, we expect to be able to extend the co-processor applications to other types of sensors like the PMW3901 optical flow velocity sensor, which uses an SPI serial interface,  lidar and  ultrasonic sensors that use UART as the serial interface, and even audio microphones.

* Thanks for the tip Drix!