Introduction

Internet of Things (IoT) applications are becoming more and more common, such as smart wearable devices, smart cities, smart medical care, and smart homes. With the expansion of the IoT industry, the demand for microcontrollers and microprocessors has increased steadily. Unlike general-purpose processors, which are designed for a wide range of applications, microcontrollers and microprocessors used in IoT systems are application-specific. These processors need to process different data when facing different application scenarios. For example, applications for text sequence analysis mainly deal with one-dimensional data, applications for image processing deal with two-dimensional data, and applications for video processing deal with three-dimensional data.

This paper proposes a new benchmark, namely IoTBench. The IoTBench workloads cover three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. The concept of evaluation subspace is proposed. Considering the different characteristics of the data used in different scenarios, the data space is divided into multiple evaluation subspaces according to data type, data dimension, and data scale. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition. Meanwhile, different evaluation indicators are selected to evaluate processors' performance, such as the ratio of iterations to running time (Iterations/Sec), Cycle Per Instruction (CPI), and Cache Miss Rate. Table 1 shows the differences between IoTBench and two popular benchmarks, CoreMark and Dhrystone.

Table 1: Comparison of IoTBench, CoreMark, and Dhrystone.
Characteristic CoreMark Dhrystone IoTBench
Written in C language, portable Y Y Y
Provide a single easily reportable score, concise and intuitive Y Y Y
Results are independent from libraries and compilers Y N Y
Cover convolution algorithm N N Y
Various data types can be evaluated N N Y
Various data dimensions can be evaluated N N Y

In the experiments, we first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say; the data features are important factors for IoT benchmarking. We then compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration. We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.

Contributions

  • We design and implement IoTBench, which covers three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. We propose the concept of evaluation subspace, which is defined by a set of data scales, dimensions, and types.

  • We analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. We also compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration.

  • We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.

Implementation

IoTbench is comprised of list processing, matrix processing, and convolution. List processing is a kind of basic operator which is widely used in IoT scenarios. When the sensor receives the data, data cleaning and preprocessing are often performed first, and then some simple statistical analysis is carried out. In this process, search and sorting based on lists are widely used. Typical IoT scenarios involve tasks such as voice control, image processing, text processing, and face recognition. Those tasks heavily depend on machine learning and deep learning. As a result, we selected the most basic operators of machine learning and deep learning, namely convolution and matrix processing. Table 2 illustrates the workloads of the IoTBench.

Table 2: Workloads and data space.
Category Workload Data type Data scale Data dimension
list processing list search INT/FLOAT any 1/2/3
list processing list sort INT/FLOAT any 1/2/3
matrix processing matrix add constant INT/FLOAT any 1/2/3
matrix processing matrix multiply constant INT/FLOAT any 1/2/3
matrix processing matrix multiply matrix INT/FLOAT any 1/2/3
convolution convolution INT/FLOAT any 1/2/3

The entire data space is divided into different evaluation subspaces according to the data scale, data dimension, and data type. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition.

Experiments

Simulator

We chose ARM and RISC-V because they are mainstream ISAs used in IoT. Also, in-order and out-of-order are two typical architectures of processors. In addition, we set the cache size according to some commercial processor manufacturers like SiFive. These settings are implemented through the command line according to the documentation of gem5. Table 3 shows the configurations of gem5 simulator.

Table 3: Configuration of Simulator.
ISA ARM RISC-V
CPU Model Minor O3
L1 ICache Size 64kB 32kB 16kB 8kB 4kB 2kB
L1 DCache Size 64kB 32kB 16kB 8kB 4kB 2kB
L2 Cache Size 1024kB 512kB 0kB

Evaluation Subspace

We use the data types INT and FLOAT in the C language; the data dimension is divided into 1 to 3 dimensions; considering that the data scale processed by the microprocessor is generally small, the data is set to two scales, namely 6144 bytes and 12288 bytes. By modifying the DATA_SIZE, DATA_TYPE, and DATA_DIM in the definition, 12 evaluation subspaces are obtained as Table 4.

Table 4: The data format of each evaluation subspace.
Evaluation Subspace DATA_SIZE/Bytes DATA_DIM DATA_TYPE
A 6144 1 INT
B 6144 2 INT
C 6144 2 FP32
D 6144 1 FP32
E 12288 1 INT
F 12288 2 INT
G 12288 2 FP32
H 12288 1 FP32
I 12288 3 FP32
J 12288 3 INT
K 6144 3 INT
L 6144 3 FP32

Other Settings

The cross-compilers used are aarch64-linux-gnu-gcc and riscv64-linux-gnu-gcc. ARM instruction set is Arm64, RISC-V instruction set is RV64GC; gem5 version is 21.2.1.0. In the gem5 directory, use SE mode to run the experiments.

Results

We first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say, the data features are important factors for IoT benchmarking.

We then compare ARM and RISC-V with MinorCPU and O3CPU using IoTBench. We find that the ARM ISA is more efficient with the same micro-architecture configuration than RISC-V.

We explore the variation of evaluation subspaces with different architecture configurations and find the different optimal architectures of different evaluation subspaces. The results are shown in table 5.

Table 5: Optimal configuration for subspace A-L.
Subspace ISA CPU Model L1 DCache/kB L1 ICache/kB Iterations/Sec
A ARM O3 16 8 28328.61
B ARM O3 16 16 11695.91
C ARM O3 16 8 12121.21
D ARM O3 16 16 28571.43
E ARM O3 32 32 13386.88
F ARM O3 32 8 5173.31
G ARM O3 32 16 5181.35
H ARM O3 32 8 13458.95
I ARM O3 32 32 5837.71
J ARM O3 32 8 5621.14
K ARM O3 16 32 10548.52
L ARM O3 16 16 10964.91

Download

IoTBench