Skip navigation

BenchCouncil: International Open Benchmark Council

 

Numbers

We report performance and energy numbers using BenchCouncil benchmarks.

All numbers are available from AIBench numbers.

Methodology and Metrics

We agree with DawnBench's (DawnBench Paper) choice on time-to-accuracy metric because some optimizations immediately improve traditional performance metrics like throughput while adversely affect the quality of the final model, which can only be observed by running an entire training session (MLPerf latest report). Unfortunately, the training time to a state-of-the-art accuracy requires a lot of execution time. However, we believe the cost of execution time cannot justify including only a few benchmarks. Actually, the cost of execution time for other benchmarks (like HPC, SPECCPU on simulator) is also prohibitively costly. However, the representativeness and coverage of a widely accepted benchmark suite are paramount important. For example, SPECCPU 2017 contains 43 benchmarks. The other examples include PARSEC3.0 (30), TPC-DS (99).

So AIBench adopts different strategies. We include more diverse benchmarks (16 problem domains, another face forensics is being added). While for performance ranking, we only choose a few representative benchmarks (less than MLPerf) for reducing the cost just like that the HPC Top500 ranking only reports HPL, HPCG, and Graph500 (three benchmarks out of 20+ representative HPC benchmarks like HPCC, NPB).

We evaluate CPUs, GPUs and other AI accelerators using AIBench. Also, we evaluate the mobile chips or IoT chips using AIoT Bench. We evaluate the HPC AI systems using HPC AI 500. BenchCouncil will publish the performance numbers periodically, more intelligent chips and accelerators will be evaluated. BenchCouncil welcomes everyone join the benchmarking and submit their results who is interested in the performance of AI systems and architectures.

AIBench Number Details

BenchCouncil reports performance number, performance & accuracy number (time-to-accuracy), and energy consumption number (energy-to-accuracy) using AIBench.

  • Time-to-accuracy Number of a few representative benchmarks from AIBench.

  • As the training time to a state-of-the-art accuracy requires a lot of execution time, for performance & accuracy ranking, we only choose a few representative benchmarks from AIBench for reducing the cost just like that the HPC Top500 ranking only reports three benchmarks.

    Time-to-accuracy number available soon.

  • Energy-to-accuracy Number of a few representative benchmarks from AIBench.

    As the training time to a state-of-the-art accuracy requires a lot of execution time, for energy ranking, we only choose a few representative benchmarks from AIBench for reducing the cost just like that the HPC Top500 ranking only reports three benchmarks.

  • Energy-to-accuracy number available soon.

  • Throughput Number of full benchmarks of AIBench.

    We evaluate CPUs, GPUs and other AI accelerators using the full benchmarks of AIBench. We run the benchmarks using optimized parameter settings to achieve the accuracy of referenced paper and report the throughput performance.

  • Throughput Number is published on http://www.benchcouncil.org/AIBench/number.html.

For workload characterization, since TBD paper and our previous work find that each iteration has the same computation logic and the iteration number has little impact on micro-architectural behaviors, we first adjust the parameters, e.g., batch size, and train the benchmarks to approaching to the accuracy stated in the referenced paper, and then use the same parameter settings and sample dozens of epochs to get micro-architectural results.