Introduction
Internet of Things (IoT) applications are becoming more and more common, such as smart wearable devices, smart cities, smart medical care, and smart homes. With the expansion of the IoT industry, the demand for microcontrollers and microprocessors has increased steadily. Unlike general-purpose processors, which are designed for a wide range of applications, microcontrollers and microprocessors used in IoT systems are application-specific. These processors need to process different data when facing different application scenarios. For example, applications for text sequence analysis mainly deal with one-dimensional data, applications for image processing deal with two-dimensional data, and applications for video processing deal with three-dimensional data.
This paper proposes a new benchmark, namely IoTBench. The IoTBench workloads cover three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. The concept of evaluation subspace is proposed. Considering the different characteristics of the data used in different scenarios, the data space is divided into multiple evaluation subspaces according to data type, data dimension, and data scale. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition. Meanwhile, different evaluation indicators are selected to evaluate processors' performance, such as the ratio of iterations to running time (Iterations/Sec), Cycle Per Instruction (CPI), and Cache Miss Rate. Table 1 shows the differences between IoTBench and two popular benchmarks, CoreMark and Dhrystone.
Characteristic | CoreMark | Dhrystone | IoTBench |
---|---|---|---|
Written in C language, portable | Y | Y | Y |
Provide a single easily reportable score, concise and intuitive | Y | Y | Y |
Results are independent from libraries and compilers | Y | N | Y |
Cover convolution algorithm | N | N | Y |
Various data types can be evaluated | N | N | Y |
Various data dimensions can be evaluated | N | N | Y |
In the experiments, we first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say; the data features are important factors for IoT benchmarking. We then compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration. We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.
Contributions
-
We design and implement IoTBench, which covers three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. We propose the concept of evaluation subspace, which is defined by a set of data scales, dimensions, and types.
-
We analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. We also compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration.
-
We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.
Implementation
IoTbench is comprised of list processing, matrix processing, and convolution. List processing is a kind of basic operator which is widely used in IoT scenarios. When the sensor receives the data, data cleaning and preprocessing are often performed first, and then some simple statistical analysis is carried out. In this process, search and sorting based on lists are widely used. Typical IoT scenarios involve tasks such as voice control, image processing, text processing, and face recognition. Those tasks heavily depend on machine learning and deep learning. As a result, we selected the most basic operators of machine learning and deep learning, namely convolution and matrix processing. Table 2 illustrates the workloads of the IoTBench.
Category | Workload | Data type | Data scale | Data dimension |
---|---|---|---|---|
list processing | list search | INT/FLOAT | any | 1/2/3 |
list processing | list sort | INT/FLOAT | any | 1/2/3 |
matrix processing | matrix add constant | INT/FLOAT | any | 1/2/3 |
matrix processing | matrix multiply constant | INT/FLOAT | any | 1/2/3 |
matrix processing | matrix multiply matrix | INT/FLOAT | any | 1/2/3 |
convolution | convolution | INT/FLOAT | any | 1/2/3 |
The entire data space is divided into different evaluation subspaces according to the data scale, data dimension, and data type. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition.
Experiments
Simulator
We chose ARM and RISC-V because they are mainstream ISAs used in IoT. Also, in-order and out-of-order are two typical architectures of processors. In addition, we set the cache size according to some commercial processor manufacturers like SiFive. These settings are implemented through the command line according to the documentation of gem5. Table 3 shows the configurations of gem5 simulator.
ISA | ARM | RISC-V | ||||
CPU Model | Minor | O3 | ||||
L1 ICache Size | 64kB | 32kB | 16kB | 8kB | 4kB | 2kB |
L1 DCache Size | 64kB | 32kB | 16kB | 8kB | 4kB | 2kB |
L2 Cache Size | 1024kB | 512kB | 0kB |
Evaluation Subspace
We use the data types INT and FLOAT in the C language; the data dimension is divided into 1 to 3 dimensions; considering that the data scale processed by the microprocessor is generally small, the data is set to two scales, namely 6144 bytes and 12288 bytes. By modifying the DATA_SIZE, DATA_TYPE, and DATA_DIM in the definition, 12 evaluation subspaces are obtained as Table 4.
Evaluation Subspace | DATA_SIZE/Bytes | DATA_DIM | DATA_TYPE |
---|---|---|---|
A | 6144 | 1 | INT |
B | 6144 | 2 | INT |
C | 6144 | 2 | FP32 |
D | 6144 | 1 | FP32 |
E | 12288 | 1 | INT |
F | 12288 | 2 | INT |
G | 12288 | 2 | FP32 |
H | 12288 | 1 | FP32 |
I | 12288 | 3 | FP32 |
J | 12288 | 3 | INT |
K | 6144 | 3 | INT |
L | 6144 | 3 | FP32 |
Other Settings
The cross-compilers used are aarch64-linux-gnu-gcc and riscv64-linux-gnu-gcc. ARM instruction set is Arm64, RISC-V instruction set is RV64GC; gem5 version is 21.2.1.0. In the gem5 directory, use SE mode to run the experiments.
Results
We first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say, the data features are important factors for IoT benchmarking.
We then compare ARM and RISC-V with MinorCPU and O3CPU using IoTBench. We find that the ARM ISA is more efficient with the same micro-architecture configuration than RISC-V.
We explore the variation of evaluation subspaces with different architecture configurations and find the different optimal architectures of different evaluation subspaces. The results are shown in table 5.
Subspace | ISA | CPU Model | L1 DCache/kB | L1 ICache/kB | Iterations/Sec |
---|---|---|---|---|---|
A | ARM | O3 | 16 | 8 | 28328.61 |
B | ARM | O3 | 16 | 16 | 11695.91 |
C | ARM | O3 | 16 | 8 | 12121.21 |
D | ARM | O3 | 16 | 16 | 28571.43 |
E | ARM | O3 | 32 | 32 | 13386.88 |
F | ARM | O3 | 32 | 8 | 5173.31 |
G | ARM | O3 | 32 | 16 | 5181.35 |
H | ARM | O3 | 32 | 8 | 13458.95 |
I | ARM | O3 | 32 | 32 | 5837.71 |
J | ARM | O3 | 32 | 8 | 5621.14 |
K | ARM | O3 | 16 | 32 | 10548.52 |
L | ARM | O3 | 16 | 16 | 10964.91 |
Download