Processors at the Edge AI

Hardware plays a very important role at the edge. The hardware at the edge is what interacts with the external context. This hardware ranges from cheap, low-power microcontrollers (thin-edge) to GPU-based accelerators and edge servers (thick-edge). Let us dive deeper into these microcontrollers, microprocessors, SoCs, FPGAs and ASICs that make the Edge AI magic possible.

Architecture of an Edge Device

A high-level architecture of the Micro Processor Units (MPUs) edge hardware consists of

High-level view
Detailed view

The application processor is the main component that runs all the algorithms and logic that make up this program. The main processor might sometimes have co-processors for specific tasks like FPU (floating point unit) to perform quick floating point calculations. The processor has RAM (volatile memory) for working memory during the program execution. Components that are together on the same silicon are called on-die and the ones that are on different silicon are called off-die. On-die components would be able to interact with each other more efficiently. Hence to increase efficiency the components can be put together but there is a limit to the size of the silicon and the larger the size, the more power-hungry the circuit becomes. It is common to have on-die RAM for quick computations with an attached off-die RAM for any buffer overflows or to buffer and store information required later.

RAM is fast memory but consumes a lot of energy and takes up space. It is essential to optimize the right amount of RAM needed.

Types of Processors

Micro Controller Units (MCUs) are for single-purpose applications. They do not have an operating system. Their software is called firmware and runs directly on the hardware. They are built on a single piece of silicon and are equipped with Flash Memory to store data, RAM, and sensors for communicating with other devices. Thus, they have everything that an MPU has minus the processor.

Comparison on->
Architecture; Clock Speed; Flash Memory; RAM; Current draw; Cost
  • Low-end MCUs – 4-16 bit architecture, <100MHz clock speed, 2-64KB flash memory, 64bytes to 2KB RAM, draw of milliamps when working and microamps when idle, costing 1-2$. Since they lack memory they are not suited for large data sets or complex signal processing. They do not have FPUs hardware thus making running AI algorithms on them hard. Usually, low-end MCUs would capture sensor data and pass it onto more sophisticated devices for processing.
  • High-end MCUs – 32-bit architecture, <1000MHz clock speed, 16KB-2MB flash, 2KB-1MB RAM, FPU Unit, SIMD (Single instruction multiple data) instructions, optional additional multi-processor cores, draw of low milliamps when operational and microamps when sleeping, low tens of dollars per unit. SIMD allows the processor to run multiple computations in parallel. Mostly used in Edge AI for their computational capabilities and embedded machine learning. They have enough power to run deep learning models including the ones that can process visual information.
  • Digital Signal Processors – DSPs – Special microcontrollers to transform digital signals. They run specific algorithms and mathematical operations quickly like multiplying, accumulation and Fourier transforms. Voice assistants use a DSP chip to run an always-on keyword (wake word) spotting model without hurting the battery life.
  • Heterogenous compute – is the combination of the above microprocessors in a single product. Low-end MCUs for specific tasks and high-end ones for signal processing and machine learning workloads. This allows for increased efficiency of the hardware.

System on Chip – 64-bit architecture; >1 GHz clock speed; multiple processor cores; external RAM and Flash generally in GBs, GPU, wireless networking, hundreds of milliamps, tens of dollars per unit.

Microcontroller is a stripped-down version of a computer and SoC squeezes all the functionality of a computer onto a single chip. In microcontrollers, the software interacts directly with the hardware whereas in SoC it has traditional OS so that the developers are abstracted from interacting with the hardware directly. Developers can code in high-level languages like c/c++ python or Rust:) SoCs are less efficient than microcontrollers but more efficient than computer systems in terms of energy requirements.

SOCs have been very useful in the industry. They power TVs, mobile phones (Qualcomm snapdragon, etc), Car entertainment systems, industrial hardware, security systems, and IoT gateways. They allow general-purpose computers to be installed in a small form factor. They usually run Linux

Deep Learning Accelerators – Both microcontrollers and SoCs are general-purpose computers. DLAs are integrated circuits that lose the flexibility of general-purpose computing to some specific operations that run extremely fast. Deep learning is based on linear algebra so DLAs also called Neural Processing Units (NPUs) are designed to perform linear algebra efficiently. Some NPUs are very efficient since the algorithm is baked into the silicon whereas others are general-purpose but not as efficient.

Generally, these are paired with microcontrollers or SOCs. The conventional processor runs the application logic whereas the NPU runs the deep learning workload

FPGAs and ASICs – Field Programmable Gate Arrays are silicon integrated circuits that can be reprogrammed on demand to implement custom hardware designs. They allow for the creation of a custom processor design that can implement the required algorithm as efficiently as possible and then load that onto the device. The designs are created using languages called HDLs or hardware description languages.

ASICs are Application Specific Integrated Circuits. They are customized for a particular application and cannot be reprogrammed. The logic is written into the silicon. Applications are designed on FPGAs for low volumes for testing and tuning, and then ASICs are used for high volumes. CFU playground and Tensil.ai are making it easier to work with FPGAs. These are machine-learning compilers and hardware generators.

Edge Server – These are conventional server hardware running in a centralized location away from the edge. The power of edge servers means that they can provide many of the benefits of cloud computing while maintaining the security, privacy, and convenience that comes with keeping data on-site. They might be on the same premises or on the shop floor and need not be cloud-based.

A quick summary of which devices are capable of processing which kind of data

Real World Implementation

Real-world scenarios consist of a combination of these devices together to solve a problem. For example, a smart speaker uses a combination of DSP for always-on keyword spotting to an application processor which is woken by the DSP to stream the audio to a cloud processor that can perform speech recognition and natural language processing to come up with an appropriate response.

Summary

In this post, we learned about the different hardware devices that make Edge AI possible. We also saw how many of them can be combined to create value in a real-world situation. The key is to keep an eye on the BLERP analysis and understand which kind of device is suitable for which kind of data.

Leave a Comment

Your email address will not be published. Required fields are marked *