BigDataBench 5.0 is released.
Download User Manual
BigDataBench 5.0 User Manual [BigDataBench-UserManual]
BigDataBench JStorm User Manual [BigDataBench-JStorm-UserManual]
BigDataBench Spark Streaming User Manual [BigDataBench-SparkStreaming-UserManual]
Download data sets
Table 1: The Summary of Data Sets
data sets | data size | Scalable data set | |
1 | Wikipedia Entries | 4,300,000English articles(unstructuredtext) | Text Generator of BDGS |
2 | Amazon Movie Reviews | 7,911,684 reviews(semi-structured text) | Text Generator of BDGS |
3 | Google Web Graph | 875713 nodes, 5105039 edges(unstructured graph) | Graph Generator of BDGS |
4 | Facebook Social Network |
4039 nodes, 88234 edges (unstructured graph) | Graph Generator of BDGS |
5 | E-commerce Transaction Data | table1:4 columns,38658 rows. table2: 6columns, 242735 rows(structured table) |
Table Generator of BDGS |
6 | ProfSearch Person Resumes | 278956 resumes(semi-structured table) | Table Generator of BDGS |
7 | CIFAR-10 | 60000 color images with the dimension of 32*32 | Ongoing development |
8 | ImageNet (1GB,10GB) | ILSVRC2014 DET image dataset(unstructured image) | Ongoing development |
9 | LSUN | One million labelled images, classified into 10 scene categories and 20 object categories | Ongoing development |
10 | TED Talks | Translated TED talks provided by IWSLT evaluation campaign | Ongoing development |
11 | SoGou Data (Search Data processed from SogouT) |
the corpus and search query data from So-Gou Labs(unstructured text) |
Ongoing development |
12 | MNIST | handwritten digits database which has 60,000 training examples and 10,000 test examples(unstructured image) |
Ongoing development |
13 | MovieLens Dataset | User’s score data for movies, which has 9,518,231 training examples and 386,835 test examples(semi-structured text) |
Ongoing development |
14 | WMT English-German | WMT English-German translation dataset | Ongoing development |
15 | MS COCO2014 | A large-scale object detection, segmentation, and captioning dataset | Ongoing development |
16 | Cityscapes | A new large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities | Ongoing development |
17 | LibriSpeech | A corpus of approximately 1000 hours of 16kHz read English speech | Ongoing development |
18 | VGGFace2 | A large-scale face recognition dataset | Ongoing development |
Download software
We provide two options: download the full software package one time or download components one by one. Please note that you need to download and deploy prerequisite software packages before using BigDataBench. Please refer to the user manual. The following packages should be installed firstly, and the running platform is Linux.
BigDataBench Download
Full software packages of different implementations are available from the following links.
Micro Benchmark:
- BigDataBench_V5.0_BigData_MicroBenchmark:
http://159.226.41.254:8090/BigDataBench/BigDataBench_V5.0_BigData_MicroBenchmark
Component Benchmark:
- BigDataBench / BigDataBench_V5.0_BigData_ComponentBenchmark:
http://159.226.41.254:8090/BigDataBench/BigDataBench_V5.0_BigData_ComponentBenchmark - BigDataBench / BigDataBench_V5.0_Graph:
http://159.226.41.254:8090/BigDataBench/BigDataBench_V5.0_Graph - BigDataBench_V5.0_Streaming:
http://159.226.41.254:8090/BigDataBench/BigDataBench_V5.0_Streaming - BigDataBench_V5.0_Multimedia_MPI:
http://159.226.41.254:8090/BigDataBench/BigDataBench_V5.0_Multimedia_MPI
BDGS:Big Data Generator Suite in BigDataBench
Name | Description | |
BDGS generates big data on the basis of six raw data sets | Text | BigDataGeneratorSuite.tar.gz Size: 40MB |
Graph | ||
Table |