In conjunction with CCF Big Data Technology Conference 2013 on Big Data

Introduction

This is the third workshop addressing the challenge of benchmarks, performance optimization, and emerging hardware of Big Data systems and applications, in conjunction with CCF Big Data Technology Conference 2013. The theme of this workshop is benchmarking and optimization of Big Data systems and cloud computing. Big Data has emerged as a strategic property of nations and organizations, researchers from enterprises and scientific research organizations are distilling meanings and values from Big Data. Big Data is high volume, high velocity, and high variety information assets that require new forms of processing, which makes it challenging to acquire values from it. Owners of Big Data can hardly make choice on which system is most suited for their specific requirements; they also have to face the problems of optimizing data processing and evaluating existing Big Data systems. In addition, with the new techniques of system architecture, operating system and programming models being put forward, the infrastructures and processing algorithms of big data systems are changing subsequently. The research work of data management and data processing based on the emerging hardware platforms and systems is well worth discussing, for example, analyzing the proper hardware and software platforms for big data.

Hightlights

Bring together big data researchers from communities of architecture, operating systems, and data management. We will discuss the mutual influences of architectures, systems, and data management in the context of big data. This workshop is very concerned about specific research and application cases.
Bridge the gap of big data researches and practices between industry and academia. Researchers from universities, institutes, and companies will attend this workshop.
This workshop is based on invited premium talks by pioneers and leaders in the field of big data, there are no papers, all the talks and discussions are available on the web page.

Topics

This workshop welcomes research and industry work that address fundamental issues in benchmarking, characterizing, designing and optimizing Big Data systems based on novel hardware and software applications.

Topics of interest include, but are not limited to:

Big Data benchmarking
Performance and energy efficiency evaluations of big data hardware platforms
Benchmarks, performance analysis and optimization of cloud computing systems
Workload characteristics analysis of data centers and CPU design
Practice report of evaluating and optimizing industrial big data systems

Organization

TBD

Program

December 6,2013, BeiJing China

Industry Standard Benchmarks: Past, Present and Future [Abstract] Industry Standard Benchmarks: Past, Present and Future ”Industry standard benchmarks have played, and continue to play a crucial role in the advancement of the computing industry. Historically they have enabled healthy competition that resulted in product improvements, evolution of new technologies and products. Industry landscape is changing at a rapid pace. Big Data and Analytics have become important across all major industry verticals, life sciences and government. This session will give a brief history of industry standard benchmarks, some of the required characteristics, associated challenges, and various industry activities in progress in developing standards for measuring of the effectiveness of hardware and software systems dealing with Big Data”	Raghunath Nambiar	Distinguished Engineer, Cisco Systems inc	CV, PPT Raghunath Nambiar “Raghunath Nambiar is a well-known expert in system performance and benchmarking. He is currently a Distinguished Engineer at Cisco Systems, Inc responsible for emerging technologies and big data strategy. He has 18 years of technical accomplishments with significant expertise in system architecture and performance engineering. He has served on several industry standard committees for performance evaluation and program committees of leading academic conferences. He is a member of the board of directors of the Transaction Processing Performance Council (TPC) and chair of its International Conference Series on Performance Evaluation and Benchmarking. He has published four books and over 30 papers. Raghu holds master’s degrees from University of Massachusetts and Goa University, and completed advanced management program from Stanford University. Raghunath is elected to chair the newly formed TPC-BD Work Group formed to develop industry standards for benchmarking Big Data systems”
Impact of Networking Technologies and Protocols on Hadoop [Abstract] Impact of Networking Technologies and Protocols on Hadoop Hadoop framework is extensively being used these days for Big Data processing and analytics. Commodity clusters are having a range of interconnects/protocols: 10GigE, InfiniBand with RDMA capability with various speeds (QDR, FDR and dual-FDR), IPoIB emulation over InfiniBand with various speeds, and RoCE (RDMA over Converged Enhanced Ethernet). In addition to the standard socket-based Hadoop design, RDMA-based Hadoop designs are emerging. In this talk, we will focus on the impact of above-mentioned networking technologies/protocols on the overall performance of Hadoop-socket and Hadoop-RDMA designs. In-depth performance results and their trends using a range of low-level micro-benchmarks (OSU Hadoop Micro-Benchmarks), higher-level benchmarks from BigDataBench/PUMA/SWIM suites will be presented.	D.K. Panda	Professor, Ohio State University	CV, PPT Dhabaleswar K. (DK) Panda Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. His research interests include parallel computer architecture, high performance networking, InfiniBand, exascale computing, programming models, GPUs and accelerators, high performance file systems and storage, virtualization, cloud computing and Big Data. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,085 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 188,000 downloads of this software have taken place from the project’s website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new Hadoop-RDMA package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda
Towards Benchmarking on Industrial Big Data[Abstract] Towards Benchmarking on Industrial Big Data Industrial big data are often referred as massive time series data generated by various manufacturing engineering systems. They bring new challenges for many existing big data technologies. In this talk, we analyze typical application scenarios of industrial big data, and summarize some properties of industrial big data as follows: 1) for data feature: big volume, dynamic schemas, skewness in data distribution; 2) for workloads characteristics: fast data feeds, real-time schema mapping, batch updates; 3) for system operation and maintenance requirements: scalability, workload balancing, etc. We will introduce several benchmark cases for the evaluation on the testing systems with real industrial data. The framework for the benchmark and some testing results will also be provided.	Jianmin Wang	Professor, Tsinghua University	CV, PPT 王建民王建民，清华大学教授、博士生导师，国家科技部中青年科技领军人才计划入选者，国家自然科学基金委杰出青年基金获得者，国家科技进步二等奖和国家教育部科技进步奖一等奖获得者，教育部新世纪优秀人才支持计划获得者，七次获清华大学研究生良师益友称号。研究方向聚焦于大数据与知识工程，研究内容主要包括：①过程与行为数据分析与度量，②非结构化数据管理技术，③产品生命周期管理技术，④数据管理与测试技术。2008年以来在IEEE TKDE、VLDB、ICDE、AAAI、ACM Multimedia、CVPR等期刊与会议上发表学术论文120余篇，获得发明专利授权10多项。
BigDataBench: A big data benchmark suite for architecture and system Communities[Abstract] BigDataBench: A big data benchmark suite for architecture and system Communities As architecture, system, and data management communities pay great attention to innovative big data (hardware) systems and architectures, the pressure of benchmarking and evaluating performance of these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data benchmarking. This talk presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite—BigDataBench, available from http://prof.ict.ac.cn/BigDataBench, not only covers board application scenarios, but also includes diverse and representative data sets. I will present several case studies using BigDataBench.	Jianfeng Zhan	Professor, ICT, CAS	CV, PPT 詹剑锋詹剑锋，中科院计算所研究员、博士生导师。国家科技进步二等奖和中科院杰出成就奖(集体奖), 华为合作贡献奖, IISWC 2013 Best paper award, 中科院计算所卓越之星获得者。主要从事系统结构、操作系统、大数据系统研究。为曙光系列高性能计算机（曾分别在Top 500上排名第10和第2）研制的集群操作系统已完成产业化。开发了覆盖所有学科的专业人士搜索系统（中文，http://prof.ict.ac.cn）。目前正在研究大数据基准测试程序 (BigDataBench)，数据中心基准测试程序 (DCBench)，面向互联网服务的操作系统(RainForest), 以及基于RainForest的数据中心操作系统DC-OS. 围绕RainForest，已经申请和正在申请操作系统专利20多项。在TC, TPDS, IEEE Micro, DSN等期刊和国际会议上发表论文55篇。更多信息请访问个人网页 http://prof.ict.ac.cn/jfzhan
A study on performance comparison of SQL-on-Hadoop systems[Abstract] A study on performance comparison of SQL-on-Hadoop systems Abstract: In this talk, we focus on one of the most important types of big data, relational data (or business data). We conduct performance tests over some popular SQL-on-Hadoop systems. The TPC-DS benchmark, which is focused on decision support tasks for large-scale data warehousing applications, is applied in this study. The experiments are conducted over a cluster of 100 virtual nodes (20GB main memory each). The systems are evaluated under various sizes of the dataset and the cluster. We will also discuss the limitation of the TPC-DS benchmark on testing systems for real time applications.	Yueguo Chen	Associate Professor, Renmin University of China	CV, PPT
Towards Benchmarking Online Social Media Analytical Queries[Abstract] Towards Benchmarking Online Social Media Analytical Queries Database benchmarking is the basis of database system selection. Its essence is application modeling and testing for performance evaluation. As a combination of time series, graph data, and unstructured data, social media is a typical kind of Big Data, and acts as important role in applications of opinion analysis, online advertisement, and customer relationship management. However, existing database benchmarks are not suitable for management and processing of social media data with characteristics of unstructured data. Thus, an open specification along with related tools would be helpful for testing various existing social media data management and analyzing technologies and systems, while stimulating the research on new methods. In this talk, we introduce BSMA, a benchmark for analytical queries over online social media data, from perspectives of benchmark architecture, data generation, workload generation, and measurement specification. The challenges on timeline query and social network query processing are also discussed.	Weining Qian	Professor, East China Normal University	CV, PPT
Benchmarking storage and data access for big data[Abstract] Benchmarking storage and data access for big data Big data typically requires high performance of the underlying storage system for data generation, data store, and data access, as well as high performance on the bandwidth and high concurrent accesses. In this talk, we will introduce a benchmarking framework of storage systems for PB size storage system. A benchmarking tool on developing as well as some testing results will be presented. We summarize the approaches for benchmarking storage systems for big data. We will also discuss on how to optimize the storage system based on the benchmarking results.	Xiao Zhang	Associate Professor, Northwestern Polytechnic University	CV, PPT

Contact Us

For workshop issues, please mail to zhanjianfeng@ict.ac.cn

For website issues, please mail to zhuwei@ict.ac.cn

BPOE-3

Introduction

Hightlights

Topics

Organization

Program

Industry Standard Benchmarks: Past, Present and Future

Raghunath Nambiar

Impact of Networking Technologies and Protocols on Hadoop

Dhabaleswar K. (DK) Panda

Towards Benchmarking on Industrial Big Data

王建民

BigDataBench: A big data benchmark suite for architecture and system Communities

詹剑锋

A study on performance comparison of SQL-on-Hadoop systems

Towards Benchmarking Online Social Media Analytical Queries

Benchmarking storage and data access for big data

Contact Us