Close

Stuff and non-sensor

A project log for The Water Watcher

Monitoring the pilot light on my water heater.

wjcarpenterWJCarpenter 10/15/2022 at 22:540 Comments

[Edit: Forget all this. See next project log article.]

In July 2021, the TSL2591 ambient light sensor I was using went bad. I replaced it, and things went back to normal. In the last couple of weeks, that replacement sensor went bad. It's probably due to the somewhat warm conditions the sensor lives in. I have replaced it again with another Adafruit breakout board, and now things are back to normal. I guess replacing the sensor every 14-15 months is not too bad, all things considered.

Before the sensor went completely bad (giving a reading of 0xFFFF on one of its sensors), it sputtered for a while. By that, I mean that I could reboot the device, either with a software reboot of the ESP32 or with a power cycle of the entire setup, and the sensor would sometimes come back to operating normally. It also had a failure mode where both sensors reported 0 readings. Partly as an educational exercise for myself, and partly to while away the time waiting for the replacement part to arrive, I explored detecting the problem and automatically rebooting the ESP32. That's not too hard in ESPHome, but it's also not immediately obvious unless you go digging into the C++ platform APIs. I don't think there's any "on error, do this" in ESPHome.

I set up a recurring check with the ESPHome "interval" component. It runs an ESPHome script once a minute:

interval:
  - interval: 60s
    then:
      - script.execute: periodic_reboot

 The script itself is imperfect but might still serve as a useful example.

  - id: periodic_reboot
    then:
      - if:
          condition:
            # We delay 10 minutes before rebooting. Otherwise, if the sensor is broken, we'll
            # instantly reboot after a reboot, and it will be very difficult to do a
            # firmware update or anything else.
            lambda: 'id(device_uptime).update(); return id(i_tsl2591).is_failed()  &&  id(device_uptime).state > 600;'
          then:
            - logger.log: {level: ERROR, format: "TSL2591 sensor is FAILED."}
            - switch.toggle: i_restart

The script checks for an error condition on the TSL2591 component. If so, it does the software equivalent of pressing the reset button on the ESP32 board. The first time I ran this, I didn't account for the fact that the first time the check happened was immediately at boot time, so an error in initializing the TSL2591 led to a boot loop. It's a bit tricky to be able to update the firmware in the ESP32 when it's in that condition. If I had already had the replacement TSL2591 on hand, I could have plugged it in and eased my pain. Instead, I just kept iterating attempted updates until I got lucky. The update include the check for being up at least 10 minutes before forcing the reboot.

The script you see above is not good enough because sometimes the sensor would seem to initialize OK and also not raise an error during a read cycle. Instead, it just gave back readings of 0. I also had a version of the script that checked for a value of 0 in the global variable CURRENT_FULL_SPECTRUM. That had the problem of causing reboots during the "dazzle" period that is a normal part of the flame cycle. I've thought a bit about various ways to make the check better, but in the meantime my replacement sensor arrived, and I've got better things to do, and yadda, yadda, yadda....

An interesting thing happened after I installed the new TSL2591 sensor. The BME280 climate sensor on the same I2C bus went bad. That sensor is not housed in the same heated environment as the TSL2591, so having it go bad at about the same time seemed a bit suspicious. I've now replaced that sensor (I had a couple of spares on-hand). My hypothesis right now is that the dying TSL2591 sensor put some kind of signals on the I2C bus that caused the BME280 to lose its mind, but beyond that I don't know much about what might have happened.

Discussions