Glider
"In het verleden behaalde resultaten bieden geen garanties voor de toekomst"
About this blog

These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.

Most content on this site is licensed under the WTFPL, version 2 (details).

Questions? Praise? Blame? Feel free to contact me.

My old blog (pre-2006) is also still available.

See also my Mastodon page.

Sun Mon Tue Wed Thu Fri Sat
       
6
Powered by Blosxom &Perl onion
(With plugins: config, extensionless, hide, tagging, Markdown, macros, breadcrumbs, calendar, directorybrowse, feedback, flavourdir, include, interpolate_fancy, listplugins, menu, pagetype, preview, seemore, storynum, storytitle, writeback_recent, moreentries)
Valid XHTML 1.0 Strict & CSS
Interrupts, sleeping and race conditions on Arduino

Arduino Community Logo

My book about Arduino and XBee includes a chapter on battery power and sleeping. When I originally wrote it, it ended up over twice the number of pages originally planned for it, so I had to severely cut down the content. Among the content removed, was a large section talking about interrupts, sleeping and race conditions. Since I am not aware of any other online sources that cover this subject as thoroughly, I decided to publish this content as a blogpost separately, which is what you're looking at now.

In this blogpost, I will first explain interrupts and race conditions using a number of examples. Then sleeping is added into the mix, which again results in some interesting race conditions. All these examples have been written for Arduino boards using the AVR architecture, but the general concepts apply equally well to other platforms.

The basics of interrupts and sleeping on AVR are not covered in detail here. If you have no experience with this, I recommend these excellent articles on interrupts and on sleeping by Nick Gammon, which cover interrupts, sleeping and other powersaving in a lot of detail.


In this post, I will show relevant snippets of the code, but omit trivial things like constant definitions or pinmode settings. The full example sketches can be downloaded as a tarball and each will be separately linked below as well.

Blinky light using interrupts

This first example will explore the use of interrupts, starting without sleeping, then sleeping will be added. The sketch will light up the internal led whenever a button is pressed and keep it lit until the button is not pressed for four seconds. Button presses will be detected using an external interrupt, using the Arduino attachInterrupt() function.

Turning the LED on

When an interrupt happens, an ISR function will be called. Note that when using attachInterrupt(), you are not defining a real ISR, but just a normal function that gets called by the ISR that is hidden inside the Arduino code. This "ISR" looks like this:

// Time of the last buttonpress
volatile uint32_t last_press = 0;

// On a buttonpress: turn on led and record time
void buttonPress() {
  digitalWrite(LED_BUILTIN, HIGH);
  last_press = millis();
}

Pressing the button turns on the LED, and remembers the timestamp of the buttonpress. If the LED is already on, the LED will be unchanged, but the timestamp will be updated.

To keep this timestamp, a global variable is defined, so it can be accessed both from the interrupt handler and the loop. Note that it is declared with the keyword volatile. This keyword tells the compiler that the variable is used from inside an interrupt handler. Formally, it is a bit more complicated than that, but in most cases it is ok to remember to use volatile on all variables that are written to inside an interrupt handler and read or written outside an interrupt handler. This keyword tells the compiler that the variable can change at any time, and that the compiler should not optimize away access to this variable.

To make sure that this function is called when the button is pressed, it has to be registered with the Arduino code:

void setup () {
  // Set up button
  pinMode(BUTTON_PIN, INPUT_PULLUP);
  attachInterrupt(BUTTON_INT, buttonPress, FALLING);

  // Set up led
  pinMode(LED_BUILTIN, OUTPUT);
}

This sets up the input and output pins and registers the buttonPress() interrupt handler. It should get called on every falling edge (so when the button is pressed).

(De)bouncing

Note that this example completely ignores switch bouncing (which is the effect that when you push or release a switch, it will very swiftly connect and disconnect a few times, causing the interrupt handler to trigger multiple times). Bouncing is not a problem for this example, but check out this article for more info on connecting switches, including some strategies for debouncing.

Turning the LED off

So, now you have a way to turn the light on, but you also need to turn it off when a timeout has passed. This is handled by the loop() function:

void loop () {
  if (millis() - last_press >= TIMEOUT)
    digitalWrite(LED_BUILTIN, LOW);
}

Here, (millis() - last_press) is the time since the last button press. Whenever that time becomes larger that TIMEOUT, the led is turned off (and, until the button is pressed again, it will continue being turned off every loop, which is not terribly useful, but won't hurt either).

Note that this particular way of handling things correctly handles millis() overflow. For more info, see this article.

If you upload the sketch to your Arduino board, you will have a button-controlled blinky light with timeout. Perfect! Or is there perhaps still a problem?

Race conditions

Perhaps, while playing with your brand new blinky toy, you noticed it did not always work as expected. If not, go ahead and press the button repeatedly. As long as you press it at least once every four seconds, the led should always remain on, right? If you keep pressing the button (as fast as you want), you will see that sometimes the led actually turns off anyway. It might need a couple dozen of presses, but it should happen eventually.

How is this possible? To understand what is going on, you will have to understand that the AVR microcontroller is an 8-bit microcontroller. This means that most of its operations, and in particular accessing memory, happen one byte (8 bits) at a time.

Note that the last_press variable is a uint32_t variable, meaning it is 32 bits, or 4 bytes long. So when the loop() function needs to read it, each of these bytes are read from memory, one by one, using separate instructions.

Now consider what happens when the pin interrupt triggers in the middle of this operation? loop() will have fetched some of the bytes but then the interrupt handler will change the variable in memory, after which the loop() continues to fetch the remaining bytes from memory. The result is that half of the bytes come from the old value, while half of the bytes come from the new value. This will likely cause the comparison to completely mess up and return true even when the timeout has not expired yet.

If this seems rather unlikely: you have seen it happening, perhaps even multiple times if you kept going for a while. Since the microcontroller is not doing a whole lot except for checking the last_press variable over and over again, the chance of the interrupt triggering at the exact right time is actually fairly significant.

What you are seeing here is what is commonly called a race condition. Generally speaking, a race condition is present when two things need to happen in a certain order but it depends on chance whether they will actually happen in the right or wrong order. Typically, when a race condition is present, the events usually happen in the correct order, and only rarely the incorrect order occurs. This often makes race conditions particularly hard to reproduce, with problems occurring occasionally in your software in production, but never on the developer's system where they could be diagnosed. Because of this, recognizing race conditions early is a big win. Whether you are dealing with interrupts in a microcontroller, multiple threads or processes in a bigger operating system, or true concurrency with multiple processors or cores, wherever there are multiple concurrent threads of execution, race conditions will be lurking around the corner.

In this case, the correct ordering of events is that the interrupt should be handled either before or after all four bytes of last_press are loaded, but not in between the loading of the bytes. Another term often used for this is to say that last_press must be loaded atomically, meaning it should not be possible to be interrupted halfway.

If you look closely, you might find there is a second race condition. Consider what happens what happens when the interrupt triggers after the last_press variable was loaded, but before the led is disabled?

In this case, the led will be turned on by the ISR, but it is immediately disabled again by loop() (which had already decided it would turn off the led). So instead of staying on for 4 more seconds, the led stays off. Triggering this bug requires pressing the button at the exact moment the light is about to turn off, so it is very unlikely that you will trigger this bug in your testing. But if you would design a device containing this code and produced it a million times, some of your users will probably see the bug. It is common to say that the window for triggering this bug is very small.

The most common way to fix this is fairly simple: disable interrupts around code that needs to be executed atomically. This ensures that an interrupt cannot occur during the code, guaranteeing correct ordering. If any interrupt triggers while interrupts are disabled, the CPU will queue the interrupt (by setting an interrupt flag bit in a register) and as soon as interrupts are enabled again, the interrupt handler will run.

Usually, you should try to only disable interrupts for a very short time. The longer interrupts are disabled, the longer any queued interrupts will have to wait, which can cause problems with timing-sensitive applications (just like interrupt handlers themselves must be short).

So, what does this mean for the code? Just disable interrupts before the if and re-enable them afterward. This ensures that loading the last_press value, but also the decision and turning off the led now happen atomically, forbidding the interrupt to trigger halfway. This looks like this:

void loop () {
  noInterrupts();
  if (millis() - last_press >= TIMEOUT)
    digitalWrite(LED_BUILTIN, LOW);
  interrupts();
}

This uses the noInterrupts() and interrupts() functions defined by Arduino to disable and re-enable interrupts globally (meaning all interrupts are disabled). You might also encounter cli() (clear interrupt bit) and sei() (set interrupt bit), which are the AVR-specific versions of the same functions. Using the Arduino versions makes it easier to port the code to other architectures too.

With this change applied to the sketch, you should be able to keep punching the button over and over again, with the led staying on (until you stop pressing for four seconds, of course).

Sleeping

Now that you have an interrupt controlled button working, time to add sleeping. One of the reasons to use interrupts, is that (only) an interrupt can wake an Arduino from its sleep. This means that with the code shown above, once the LED is off and the Arduino is waiting for the next button press, it can just go to sleep, knowing it will be woken up when a button is pressed:

void loop () {
  noInterrupts();
  if (millis() - last_press >= TIMEOUT) {
    digitalWrite(LED_BUILTIN, LOW);
    doSleep();
  }
  interrupts();
}

This is just the previous loop() function, with the doSleep() call added after the led is turned off (the doSleep() function is shown below). Since a button press wakes up the microcontroller, it will end up waiting for a button press in slumber and only resuming with the code after doSleep() when a button was pressed.

Note that doSleep() is called with interrupts disabled. You might think it would be good to re-enable interrupts after turning off the LED, to keep them disabled as short as possible. However, this would introduce another race condition: Consider what would happen if the button interrupt would trigger after turning off the led, but before going to sleep? In this case, the interrupt handler would turn on the led, and then the microcontroller goes to sleep. During sleep the loop() function will not run to detect that four seconds have passed, so the led will stay on indefinitely (until pressing the button wakes up the microcontroller again).

By keeping interrupts disabled when calling doSleep(), this is avoided. However, the interrupts cannot remain disabled when actually going to sleep, since then the microcontroller can never wake up again. So they should be re-enabled just before going to sleep:

void doSleep() {
  set_sleep_mode (SLEEP_MODE_PWR_DOWN);

  sleep_enable();
  interrupts();
  sleep_cpu ();
  sleep_disable();
}

But what if a button press happens between disabling interrupts and re-enabling them again just before sleeping? As interrupts are disabled, the interrupt handler cannot run, but a flag will be set and the ISR will run after interrupts are enabled again. Now, the AVR architecture guarantees that after interrupts are enabled, at least one instruction runs uninterrupted. In this case, this means that the sleep_cpu() function (which translates to a single sleep instruction) always runs. If an interrupt is pending, it will be processed after the sleep instruction, causing the microcontroller to wake up immediately again (which is not terribly efficient, but it is correct).

So, with this sketch, the Arduino is sleeping while the LED is off, significantly reducing power usage. However, when waiting for these 4 seconds to pass, the Arduino is still running and consuming power. To also sleep while waiting, you will need some way to wake up after 4 seconds have passed.

Timed wakeups

With what you have seen so far, you could add an external timer that pulls a line high or low after some time. If you connect this line to an Arduino interrupt pin, you can have the Arduino wake up at the right moment. In fact, this approach is sometimes used combined with a real-time clock (RTC) module, which is particularly good at keeping accurate time and tracking long timeouts.

Fortunately, the AVR microcontrollers also feature a number of internal timers that can be used for this purpose. The most power efficient timer is the watchdog timer. Its original purpose is to automatically reset the microcontroller when it is locked up, by keeping a counter and resetting the system if the software does not regularly reset this counter.

However, the watchdog module has a second mode, where it does not reset the entire microcontroller, but just fires an interrupt. Since the watchdog timer runs on a private 128kHz oscillator, it can also run while in power-down mode (where all other clocks are disabled) and its interrupt can cause a wakeup. In addition to being low-power, this oscillator is not very accurate and cannot handle arbitrary intervals. If you need a more precise timing source, you should look at using power-save mode and run timer2 in asynchronous mode, using a secondary crystal.

To use the watchdog timer for the four-second timeout, the buttonPress() interrupt handler should be modified as follows:

void buttonPress() {
  digitalWrite(LED_BUILTIN, HIGH);
  wdt_reset();
  enable_wdt_interrupt(WDTO_4S);
}

As before, this turns on the led. However, instead of keeping a last_press timestamp, this resets the watchdog timer (restarting its counter if it was already counting) and it enables the watchdog and configures it to count up to four seconds in case it was not running yet.

Note that this uses the enable_wdt_interrupt() function to enable the watchdog timer interrupt, but not the watchdog timer itself. Unfortunately, avr-libc does not provide a function for this, so this uses a custom wdt_interrupt.h file that you need to put alongside the sketch.

Now, when the watchdog timer expires, an interrupt is triggered. This wakes up the microcontroller and then runs the WDT_vect interrupt handler, which is defined as follows:

ISR(WDT_vect) {
  digitalWrite(LED_BUILTIN, LOW);
  wdt_disable();
}

When the timeout happens, the led is turned off, and the watchdog is disabled again (it will be re-enabled when you press the button).

Since both of the interrupt handlers above already take care of all behavior needed, there is nothing left for the loop() function to do other than to sleep:

void loop () {
  doSleep();
}

Now with this sketch all the race conditions you fixed earlier can no longer occur now, since the loop() function no longer makes any decisions about whether to sleep, and the two interrupt handlers cannot interrupt each other (so each will run atomically already).

So, hopefully after reading this post, you will have a better idea of the challenges and race conditions involved when making an Arduino sleep. If you have further questions or remarks, feel free to leave them below!

 
0 comments -:- permalink -:- 11:40
Copyright by Matthijs Kooijman - most content WTFPL