Overview
As a subset of BigDataBench, BigDataBench-DCA is China’s first industry-standard big data benchmark suite, released by Telecom Research Institute of Ministry of Industry and Information Technology together with ICT, CAS, Huawei, China Mobile, Sina, ZTE, Intel (China), Microsoft (China), IBM CDL, Baidu, INSPUR , ZTE, 21viane and UCloud. Currently, the specifications of BigDataBench-DCA have been submitted to and under review of China’s Ministry of Industry and Information Technology.
Benchmarks
There are six data sets and ten workloads in BigDataBench-DCA. Table 1 summarizes the real-world data sets and scalable data generation tools included into BigDataBench-DCA; Table 2 presents the workloads of BigDataBench-DCA.
Table 1: The Summary of Data Sets
data sets | Raw data size | Scalable data set | |
1 | Wikipedia Entries | 4,300,000 English articles(unstructured text) | Text Generator of BDGS of BigDataBench |
2 | Amazon Movie Reviews | 7,911,684 reviews (semi-structured text) | Text Generator of BDGS of BigDataBench |
3 | Google Web Graph | 875713 nodes, 5105039 edges (unstructured graph) | Grap Generator of BDGS of BigDataBench |
4 | Facebook Social Network | 4039 nodes, 88234 edges (unstructured graph) | Grap Generator of BDGS of BigDataBench |
5 | E-commerce Transaction Data | Table 1: 4 columns, 38658 rows.Table 2: 6 columns, 242735 rows (structured tables) | Table Generator of BDGS of BigDataBench |
6 | ProfSearch Person Resumes | 278956 resumes (semistructured table) | Table Generator of BDGS of BigDataBench |
Table 2. The summary of the workloads in BigDataBench-DCA
Operations or Algorithm | Types | Data Source | Data Generator Suite |
TeraSort | IO-Intensive | Text | From Hadoop |
WordCount | CPU-Intensive | Text | From BigDataBench |
PageRank | Hybrid | Graph | From BigDataBench |
K-means | CPU-Intensive | Graph | From BigDataBench |
NaiveBayes | CPU-Intensive | Text | From BigDataBench |
Join | Hybrid | Table | From BigDataBench |
Aggregation | Hybrid | Table | From BigDataBench |
Read/Write/Scan | IO-Intensive | Table | From BigDataBench |
Download
- Hadoop-1.0.2 version: BigDataBench_DCA_1.tar.gz
- Hadoop-2.2.0 version: BigDataBench_DCA_2.tar.gz
- BigDataBench_DCA user manual: [user manual]
Contributors
- Kai Wei, CAICT
- Chunyu Jiang, CAICT
- Jianfeng Zhan, ICT, CAS
- Lei Wang, ICT, CAS
- Jingwei Li
- Kai Chen, CAICT
- Shu Wang, Huawei
- Yang Lu, CMCC
- Lan Yi, Intel
- Ning Zou, Intel
- Guomao Xin, Inspur
- Jing He, Inspur
- Yixian Xu, Microsoft China
- Na Zhang, Microsoft China
- Liming Zhou, ZTE
- Dongjie Wei, IBM China
- Xiaoyi Wang, IBM China
- Lei Cong, Sina
- Xiangyu Jiang, Baidu
- Fei Yang, Baidu
- He Kang, 21viane
- Xingjian Zhou, 21viane
- Dongdong Wang, UCloud
For Citations
If you need a citation for BigDataBench-DCA, please cite the following papers related with your work:
BigDataBench: a Big Data Benchmark Suite from Internet Services. [PDF]
Lei Wang, Jianfeng Zhan, ChunjieLuo, Yuqing Zhu, Qiang Yang, Yongqiang He, WanlingGao, Zhen Jia, Yingjie Shi, Shujie Zhang, Cheng Zhen, Gang Lu, Kent Zhan, Xiaona Li, and BizhuQiu. The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, USA.
License
BigDataBench-DAC is available for researchers interested in big data. Software components of BigDataBench-DAC are all available as open-source software and governed by their own licensing terms. Researchers intending to use BigDataBench-DAC are required to fully understand and abide by the licensing terms of the various components. BigDataBench-DAC is open-source under the Apache License, Version 2.0. Please use all files in compliance with the License. Our BigDataBench-DAC Software components are all available as open-source software and governed by their own licensing terms. If you want to use our BigDataBench-DAC you must understand and comply with their licenses. Software developed externally (not by BigDataBench group)
- GCC: http://gcc.gnu.org/releases.html
- GSL: http://www.gnu.org/software/gsl/
- Hadoop: http://www.apache.org/licenses/LICENSE-2.0
- HBase : http://hbase.apache.org/
- Hive: http://hive.apache.org/
- MySQL : http://www.mysql.com/
- Mahout: http://www.apache.org/licenses/LICENSE-2.0
- Zookeeper: http://zookeeper.apache.org/
Software developed internally (by BigDataBench group) BigDataBench-DAC License BigDataBench-DAC Suite Copyright (c) 2013-2015, ICT, Chinese Academy of Sciences All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistribution of source code must comply with the license and notice disclaimers
- Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided by the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ICT CHINESE ACADEMY OF SCIENCES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.