IoTBench | A Data Centrical and Configurable IoT Benchmark Suite

Introduction

Internet of Things (IoT) applications are becoming more and more common, such as smart wearable devices, smart cities, smart medical care, and smart homes. With the expansion of the IoT industry, the demand for microcontrollers and microprocessors has increased steadily. Unlike general-purpose processors, which are designed for a wide range of applications, microcontrollers and microprocessors used in IoT systems are application-specific. These processors need to process different data when facing different application scenarios. For example, applications for text sequence analysis mainly deal with one-dimensional data, applications for image processing deal with two-dimensional data, and applications for video processing deal with three-dimensional data.

This paper proposes a new benchmark, namely IoTBench. The IoTBench workloads cover three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. The concept of evaluation subspace is proposed. Considering the different characteristics of the data used in different scenarios, the data space is divided into multiple evaluation subspaces according to data type, data dimension, and data scale. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition. Meanwhile, different evaluation indicators are selected to evaluate processors' performance, such as the ratio of iterations to running time (Iterations/Sec), Cycle Per Instruction (CPI), and Cache Miss Rate. Table 1 shows the differences between IoTBench and two popular benchmarks, CoreMark and Dhrystone.

Table 1: Comparison of IoTBench, CoreMark, and Dhrystone.
Characteristic	CoreMark	Dhrystone	IoTBench
Written in C language, portable	Y	Y	Y
Provide a single easily reportable score, concise and intuitive	Y	Y	Y
Results are independent from libraries and compilers	Y	N	Y
Cover convolution algorithm	N	N	Y
Various data types can be evaluated	N	N	Y
Various data dimensions can be evaluated	N	N	Y

In the experiments, we first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say; the data features are important factors for IoT benchmarking. We then compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration. We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.

Contributions

We design and implement IoTBench, which covers three types of algorithms commonly used in IoT applications: matrix processing, list operation, and convolution. We propose the concept of evaluation subspace, which is defined by a set of data scales, dimensions, and types.
We analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. We also compare ARM with RISC-V and MinorCPU with O3CPU using IoTBench. We find that the ARM ISA is more efficient than RISC-V with the same micro-architecture configuration.
We explored the performance of processors with different architecture configurations in different evaluation subspaces and found the optimal architecture of different evaluation subspaces.

Implementation

IoTbench is comprised of list processing, matrix processing, and convolution. List processing is a kind of basic operator which is widely used in IoT scenarios. When the sensor receives the data, data cleaning and preprocessing are often performed first, and then some simple statistical analysis is carried out. In this process, search and sorting based on lists are widely used. Typical IoT scenarios involve tasks such as voice control, image processing, text processing, and face recognition. Those tasks heavily depend on machine learning and deep learning. As a result, we selected the most basic operators of machine learning and deep learning, namely convolution and matrix processing. Table 2 illustrates the workloads of the IoTBench.

Table 2: Workloads and data space.
Category	Workload	Data type	Data scale	Data dimension
list processing	list search	INT/FLOAT	any	1/2/3
list processing	list sort	INT/FLOAT	any	1/2/3
matrix processing	matrix add constant	INT/FLOAT	any	1/2/3
matrix processing	matrix multiply constant	INT/FLOAT	any	1/2/3
matrix processing	matrix multiply matrix	INT/FLOAT	any	1/2/3
convolution	convolution	INT/FLOAT	any	1/2/3

The entire data space is divided into different evaluation subspaces according to the data scale, data dimension, and data type. A set of data scales, dimensions, and types defines an evaluation subspace, and the entire data space can be divided into countless evaluation subspaces. In practice, users only need to obtain certain evaluation subspace to run the bench according to the actual scenario requirements. The three parameters of the evaluation subspace can be modified in the definition.

Experiments

Simulator

We chose ARM and RISC-V because they are mainstream ISAs used in IoT. Also, in-order and out-of-order are two typical architectures of processors. In addition, we set the cache size according to some commercial processor manufacturers like SiFive. These settings are implemented through the command line according to the documentation of gem5. Table 3 shows the configurations of gem5 simulator.

Table 3: Configuration of Simulator.
ISA	ARM	RISC-V
CPU Model	Minor	O3
L1 ICache Size	64kB	32kB	16kB	8kB	4kB	2kB
L1 DCache Size	64kB	32kB	16kB	8kB	4kB	2kB
L2 Cache Size	1024kB	512kB	0kB

Evaluation Subspace

We use the data types INT and FLOAT in the C language; the data dimension is divided into 1 to 3 dimensions; considering that the data scale processed by the microprocessor is generally small, the data is set to two scales, namely 6144 bytes and 12288 bytes. By modifying the DATA_SIZE, DATA_TYPE, and DATA_DIM in the definition, 12 evaluation subspaces are obtained as Table 4.

Table 4: The data format of each evaluation subspace.
Evaluation Subspace	DATA_SIZE/Bytes	DATA_DIM	DATA_TYPE
A	6144	1	INT
B	6144	2	INT
C	6144	2	FP32
D	6144	1	FP32
E	12288	1	INT
F	12288	2	INT
G	12288	2	FP32
H	12288	1	FP32
I	12288	3	FP32
J	12288	3	INT
K	6144	3	INT
L	6144	3	FP32

Other Settings

The cross-compilers used are aarch64-linux-gnu-gcc and riscv64-linux-gnu-gcc. ARM instruction set is Arm64, RISC-V instruction set is RV64GC; gem5 version is 21.2.1.0. In the gem5 directory, use SE mode to run the experiments.

Results

We first analyze the impact of different data types, data dimensions, and data scales on processor performance. The results show that data type, data dimension, and data scale affect the performance distinctly. That is to say, the data features are important factors for IoT benchmarking.

We then compare ARM and RISC-V with MinorCPU and O3CPU using IoTBench. We find that the ARM ISA is more efficient with the same micro-architecture configuration than RISC-V.

We explore the variation of evaluation subspaces with different architecture configurations and find the different optimal architectures of different evaluation subspaces. The results are shown in table 5.

Table 5: Optimal configuration for subspace A-L.
Subspace	ISA	CPU Model	L1 DCache/kB	L1 ICache/kB	Iterations/Sec
A	ARM	O3	16	8	28328.61
B	ARM	O3	16	16	11695.91
C	ARM	O3	16	8	12121.21
D	ARM	O3	16	16	28571.43
E	ARM	O3	32	32	13386.88
F	ARM	O3	32	8	5173.31
G	ARM	O3	32	16	5181.35
H	ARM	O3	32	8	13458.95
I	ARM	O3	32	32	5837.71
J	ARM	O3	32	8	5621.14
K	ARM	O3	16	32	10548.52
L	ARM	O3	16	16	10964.91

Download

IoTBench

	IoTBench \| A Data Centrical and Configurable IoT Benchmark Suite
Introduction Contributions Implementation Experiments Results Download BenchCouncil