Overview
Data generated from modern scientific instruments have grown up to an unprecedented scale. Moreover, data formats and computational behaviors of scientific big data workloads are much more complex than those in Internet services. These two facts pose a serious challenge to scientific data management and analytics. Among many concerns, the first one is how to build a comprehensive and representative scientific big data benchmark suite. Previous benchmark efforts either focus on Internet areas (i.e. BigDataBench) or pay attention to a specific area (i.e. GeneBASE). We present the comprehensive scientific big data benchmark suite—BigDataBench-S.
Benchmarks
There are three data sets and 17 workloads in BigDataBench-S. Table 1 summarizes the real-world data sets and workloads of BigDataBench-S.
Domains | Datasets | Workloads | |
High Energy Physics | ATLAS Dataset | Data Manipulation Queries | Selection: select events based on filter conditions |
Classification | SVM | ||
k-Nearest Neighbor | |||
LDA | |||
Regression | Boosted decision trees | ||
Maximum likelihood fit | |||
Astronomy | Simulated Dataset Using Generator from SS-DB | Data Manipulation Queries | Selection: Select images in given time and space ranges |
Aggregation: Compute average value of cells of images to find average background noise | |||
Join: Select values of each cell of images with the average noise of this cell | |||
Complex Analysis | Intersection of images | ||
Sigma-Clipping | |||
Genomics | Simulated Dataset Using Generator from GenBase | Data Manipulation Queries | Selection: Select genes based on filter conditions |
Aggregation: Compute average expression value of genes | |||
Join: Join genes with gene ontologies | |||
Complex Analysis | QR decomposition | ||
SVD | |||
Covariance | |||
Microbe | http://prof.ict.ac.cn/bdb_uploads/GCM-Bench/ |
Downloads
GCM-Bench: A General Benchmark for RDF Data Management System(https://github.com/renfliu/gcm-bench);
For Citations
If you need a citation for BigDataBench-S, please cite the following papers related with your work:
BigDataBench-S: An Open-source Scientific Big Data Benchmark Suite【PDF】
Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan. BigDataBench-S: An Open-source Big Data Benchmark Suite. The 3rd IEEE International Workshop on High-Performance Big Data Computing (HPBDC), IPDPSW2017
People
- 詹剑锋 ,中科院计算所
- 黎建辉 ,中科院计算网络中心
- 孟小峰 ,中国人民大学
- 都志辉 ,清华大学
- 邹磊 ,北京大学
- 齐勇 ,西安交通大学
- 沈志宏 ,中科院计算网络中心
- 王培健 ,西安交通大学
- 查礼 ,中科院计算所
- 程耀东 ,中科院高能物理所
- 徐俊刚 ,中国科学院大学
- 张知非 ,首都医科大学
- 贾禛 ,普林斯顿大学
- 田昕晖 ,中科院计算所
- 戴绍鹏 ,中科院计算所
- 高婉铃 ,中科院计算所