Posts | Comments

Planet Arduino

Archive for the ‘Keyword Spotting’ Category

As Jallson Suryo discusses in his project, adding voice controls to our appliances typically involves an internet connection and a smart assistant device such as Amazon Alexa or Google Assistant. This means extra latency, security concerns, and increased expenses due to the additional hardware and bandwidth requirements. This is why he created a prototype based on an Arduino Nicla Voice that can provide power for up to four outlets using just a voice command.

Suryo gathered a dataset by repeating the words “one,” “two,” “three,” “four,” “on,” and “off” into his phone and then uploaded the recordings to an Edge Impulse project. From here, he split the files into individual words before rebalancing his dataset to ensure each label was equally represented. The classifier model was trained for keyword spotting and used Syntiant NDP120-optimal settings for voice to yield an accuracy of around 80%.

Apart from the Nicla Voice, Suryo incorporated a Pro Micro board to handle switching the bank of relays on or off. When the Nicla Voice detects the relay number, such as “one” or “three”, it then waits until the follow-up “on” or “off” keyword is detected. With both the number and state now known, it sends an I2C transmission to the accompanying Pro Micro which decodes the command and switches the correct relay.

To see more about this voice-controlled power strip, be sure to check out Suryo’s Edge Impulse tutorial.

The post Controlling a power strip with a keyword spotting model and the Nicla Voice appeared first on Arduino Blog.

Speech recognition is everywhere these days, yet some languages, such as Shakhizat Nurgaliyev and Askat Kuzdeuov’s native Kazakh, lack sufficiently large public datasets for training keyword spotting models. To make up for this disparity, the duo explored generating synthetic datasets using a neural text-to-speech system called Piper, and then extracting speech commands from the audio with the Vosk Speech Recognition Toolkit.

Beyond simply building a model to recognize keywords from audio samples, Nurgaliyev and Kuzdeuov’s primary goal was to also deploy it onto an embedded target, such as a single-board computer or microcontroller. Ultimately, they went with the Arduino Nicla Voice development board since it contains not just an nRF52832 SoC, a microphone, and an IMU, but an NDP120 from Syntiant as well. This specialized Neural Decision Processor helps to greatly speed up inferencing times thanks to dedicated hardware accelerators while simultaneously reducing power consumption. 

With the hardware selected, the team began to train their model with a total of 20.25 hours of generated speech data spanning 28 distinct output classes. After 100 learning epochs, it achieved an accuracy of 95.5% and only consumed about 540KB of memory on the NDP120, thus making it quite efficient.

To read more about Nurgaliyev and Kuzdeuov’s project and how they deployed an embedded ML model that was trained solely on generated speech data, check out their write-up here on Hackster.io.

The post Small-footprint keyword spotting for low-resource languages with the Nicla Voice appeared first on Arduino Blog.



  • Newsletter

    Sign up for the PlanetArduino Newsletter, which delivers the most popular articles via e-mail to your inbox every week. Just fill in the information below and submit.

  • Like Us on Facebook