Running Neural Networks on Microcontrollers

3D face recognition, voice control and image recognition of faulty or fake parts were all the rage at Embedded World 2019. The trend towards performing these applications offline – without access to powerful compute servers in the cloud – was obvious. However, these applications require powerful CPUs with equally powerful GPUs to run predictions on deep neural networks.

I was interested in solutions that run (deep) neural networks (NNs) on microcontrollers (MCUs). And, Embedded World didn’t disappoint! Renesas showed how to predict motor failure by monitoring current and vibration. STMicroelectronics had a demo counting the cars driving by using the characteristic sound of the Doppler effect. That’s what I would call the “edge”!

Renesas: Predicting Motor Failure

Renesas showed a demo how to predict the motor failure in home appliances like washing machines, refrigerators and air conditioners. The board and motor on the right-hand side of Figure 1 simulate the rotating drum of a washing machine. The board, which features an RX66T microcontroller (MCU), uses two sensor inputs: the current drawn by the motor and the vibration measured by an accelerometer.

If the friction in the rotating drum increases – because of the bearings slowly wearing out, the motor draws more current. A washing machine starts wandering around, when the motor vibration is too strong. Another indicator of an impending failure may be strange or especially loud noises of the motor.

Figure 1: Failure prediction for the motor on the right-hand side.
Renesas has a video of the demo.

The goal is to detect motor problems long before the motor fails. Maintenance or an early repair are much less expensive than replacing the motor. Renesas trained a fairly simple neural network on a PC to detect motor problems early. The neural network is shown in the middle of Figure 1.

TensorFlow’s or Caffe’s runtime for performing forward propagation on a neural network to predict the motor’s health is far too resource-hungry for an MCU. Hence, Renesas provides a tool, e-AI Translator, to generate code for the trained neural network produced by TensorFlow or Caffe. According to the Renesas engineer doing the demo, the RX66T MCU doesn’t use any hardware acceleration for prediction, although it has DSP instructions. I would expect e-AI Translator to take advantage of these DSP instructions in the future.

Figure 2: e-AI Translator generates code to run the trained neural network on the MCU
(see Development Environment & Downloads for more information) .

The other tool, e-AI Checker, checks whether the target MCU has enough ROM and RAM for the neural-network runtime. It also calculates the time needed by the target MCU to run forward propagation or prediction on the neural network.

STM: Counting Cars by Sound

STMicroelectronics (STM) and Bluewind, an Italian engineering company, showed how to count the number of cars driving by – using the characteristic sound of the Doppler effect. The demo setup is shown in Figure 3. The sound of the cars is played through the loudspeakers.

The board in the middle of Figure 3 is an MCU from the STM32Cube family. It acquires the sound through its microphone and prepares the sound for further analysis with a DSP. The neural network detects, whether a car drove by and whether it drove from left to right or from right to left. This information is displayed on the tablet at the bottom of Figure 3.

You might be surprised that sound is used instead of video. Detecting cars from video would require a far more powerful and expensive microprocessor like the NVIDIA Jetson. Sound detection also works on a microcontroller, which is much less powerful but also much cheaper.

Figure 3: STM32cube uses sound to count how many cars drive by
(see also the video).

STMicroelectronics provides a toolset, X-CUBE-AI, to generate C code for a trained neural network produced by deep-learning networks like Keras and Caffe (TensorFlow not yet supported). X-CUBE-AI contains a tool to estimate the RAM, ROM and CPU consumption of the neural network for a given MCU.

Scroll to top