Introduction

Big data has emerged as a strategic property of nations and organizations. There are driving needs to generate values from big data. However, the sheer volume of big data requires significant storage capacity, transmission bandwidth, computations, and power consumption. It is expected that systems with unprecedented scales can resolve the problems caused by varieties of big data with daunting volumes. Nevertheless, it is very difficult for big data owners to make choice on which system is most suited for their specific requirements. They also face challenges on how to optimize the systems and their solutions. Meanwhile, system researchers are working on new hardware architecture, operating systems, programming systems, and data management systems to improve performance in dealing with big data.

This workshop, the fourth in its series, aims at bringing researchers and practitioners in related areas together to discuss the research issues at the intersection of these areas, and also to draw much attention from architecture, systems, programming, and data management research communities to this new and highly promising field.

Topics

The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Early stage work, new ideas, unconventional approaches are encouraged. Specific topics of interest include but are not limited to:

Big data workload characterization and benchmarking
Innovative computer and memory architecture for big data
Emerging hardware technologies in big data systems
Innovative operating systems and programming systems for big data
Interactions among architecture, systems and data management
Performance analysis and optimization of big data systems
Innovative prototypes of big data infrastructures
Practice report of evaluating and optimizing large-scale big data systems

Papers should present original research. As big data spans many disciplines, papers should provide sufficient background material to make them accessible to the broader community.

Venue Information

Salt Lake City, Utah, USA

Important dates

Submission Deadline: Extended to January 14, 2014 (11:59pm PST)

Author Notification: January 25, 2014

Final Copy Due: February 17, 2014

Workshop: Saturday, March 1, 2014

Workshop home page:

Submission Web page: https://www.easychair.org/conferences/?conf=bpoe4

Organization

Steering committee:

Christos Kozyrakis, Stanford
Xiaofang Zhou, University of Queensland
Dhabaleswar K Panda, Ohio State University
Aoying Zhou, East China Normal University
Raghunath Nambiar, Cisco
Lizy K John, University of Texas at Austin
Xiaoyong Du, Renmin University of China
Ippokratis Pandis, IBM Almaden Research Center
Xueqi Cheng, ICT, Chinese Academy of Sciences
Bill Jia, Facebook
Lidong Zhou, Microsoft Research Asia
H. Peter Hofstee, IBM Austin Research Laboratory
Haibo Chen, Shanghai Jiaotong University
Alexandros Labrinidis, University of Pittsburgh
Cheng-Zhong Xu, Wayne State University
Jianfeng Zhan, ICT, Chinese Academy of Sciences

Program Chair:

Jianfeng Zhan, ICT, Chinese Academy of Sciences
Chuliang Weng, Shannon (IT) Lab, Huawei

Web Chair:

Lei Wang, ICT, Chinese Academy of Sciences

Publicity Chairs:

Yuqing Zhu (Data management), ICT, CAS
Gang Lu (Operating systems), ICT, CAS
Zhen Jia (Architecture), ICT, CAS

Program Committee

Onur Mutlu, Carnegie Mellon University
Xu Liu, Rice University
Yunquan Zhang, ICT, Chinese Academy of Sciences
Meikel Poess, Oracle Corporation
Dejun Jiang, ICT, Chinese Academy of Sciences
Yueguo Chen, Renmin University
Rene Mueller, IBM
Xiaoyi Lu, Ohio State University
Yongqiang He, Dropbox
Edwin Sha, University of Texas at Dallas
Kun Wang, IBM Research China
Rong Chen, Shanghai Jiaotong University
Jens Teubner, Tu Dortmund University
Yinliang Yue, ICT, Chinese Academy of Sciences
Mauricio Breternitz, AMD Research
Seetharami Seelam, IBM
Zhenyu Guo, MSRA
Farhan Tauheed, EPFL
Gansha Wu, Intel
Bingsheng He, Nanyang Technological University
Zhibin Yu, SIAT, Chinese Academy of Sciences
Lei Wang, ICT, Chinese Academy of Sciences
Yuanchun Zhou, CNIC, Chinese Academy of Sciences
Tilmann Rabl, University of Toronto
Weijia Xu, TACC, University of Texas at Austin
Mingyu Chen, ICT, Chinese Academy of Sciences
Jian Ouyang, Baidu
Wentao Qu, Google, US

PAPER SUBMISSION

Online submission site: https://www.easychair.org/conferences/?conf=bpoe4

Papers must be submitted in PDF, and be no more than 6 pages in standard two-column SIGPLAN conference format including figures and tables but not including references. The submissions will be judged based on the merit of the ideas rather than the length. Submissions must be made through the on-line submission site. Final papers and presentations will be accessible from the workshop website, but to facilitate resubmission to more formal venues, no archival proceedings will be published, and papers will not be sent to the ACM Digital Library. After proceeding, revised papers will be published by Springer LNCS (www.springer.com/lncs, indexed by EI).

Program

March 1, 2014

Morning

Session chair: Professor Jianfeng Zhan

Time	Topic	People
9:00-9:05	Opening remark[.PPT]	Professor Jianfeng Zhan
9:05-10:00	Keynote presentation: Big Data Workloads: An Architect’s Perspective[.PPT]	Professor Lizy K John	University of Texas at Austin
10:00-10:30	Morning break
10:30-11:25	Keynote presentation: Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop[.PDF]	Professor Dhabaleswar K. (DK) Panda	The Ohio State University
11:25-11:45	On Big Data Benchmarking[PDF]	Rui Han and Xiaoyi Lu
11:45-12:05	Performance Benefits of DataMPI: A Case Study with BigDataBench[PDF]	Liang Fan, Feng Chen, Lu Xiaoyi and XuZhiwei

Afternoon

Session chair: Dr. ChuliangWeng

Time	Topic	People
14:00-14:55	Keynote presentation: Resource Efficient Cloud Computing[PDF]	Professor Christos Kozyrakis	Stanford
15:00-15:30	Afternoon break
15:30-16:25	Keynote presentation: Power Technology For a Smarter Future	Dr. Jeff Stuecheli	IBM
16:25-16:45	Exploring Opportunities for Non-Volatile Memories in Big Data Applications[PDF]	Wei Wei, Dejun Jiang, Jin Xiong and Mingyu Chen
16:45-17:05	Tuning Hadoop map slot value using CPU metric[PDF]	Kamal Kc and Vincent Freeh
17:05-17:25	I/O Characterization of Big Data Workloads in Data Centers[PDF]	Fengfeng Pan, YinliangYue and Jin Xiong
17:25-17:45	Characterizing Workload of Web Applications on Virtualized Servers[PDF]	Xiajun Wang, Song Huang, Song Fu and Krishna Kavi
17:45-18:00	Benchmarking Trajectory Data for Trip Recommendation[PDF]	Kuien Liu, Yaguang Li, Shuo Shang and Kai Zheng
18:00-18:05	Closing remark	Dr. ChuliangWeng

Keynote Speakers

1. Big Data Workloads: An Architect’s Perspective

Professor Lizy Kurian John

University of Texas at Austin

http://users.ece.utexas.edu/~ljohn/

Abstract:

Much of the modern big data is unstructured data as opposed to traditional table-based structured data. While the traditional relational data base may not disappear for years to come, it is clear that other computing paradigms processing unstructured (and semi-structured) data will gain momentum in the light of the nature of the emerging big data. Applications that perform advanced analytics and machine learning on graphs will become more prevalent. Analysis of web-scale graphs from social networks will become important due to commercial significance. Understanding of emerging big data computing workloads is required in order to drive hardware and software development. Several questions need to be answered in order to design appropriate computing systems to support these applications in a performance, power and energy-efficient fashion. It would be interesting to analyze communication patterns in these applications and study the relevance for mechanisms supporting localized communications (fog computing as opposed to cloud). This talk will describe our ongoing research to investigating these and like questions. Efforts in workload characterization and big-data performance evaluation will be described.

Bioigraphy:

Lizy Kurian John is B. N. Gafford Professor in the Electrical and Computer Engineering at UT Austin. She received her Ph. D in Computer Engineering from the Pennsylvania State University. Her research interests include workload characterization, performance evaluation, benchmarking, and high performance processor and memory architectures for emerging workloads. She is recipient of NSF CAREER award, UT Austin Engineering Foundation Faculty Award, Halliburton, Brown and Root Engineering Foundation Young Faculty Award , University of Texas Alumni Association (Texas Exes) Teaching Award, The Pennsylvania State University Outstanding Engineering Alumnus, etc. She has coauthored a book on Digital Systems Design using VHDL (Thomson Publishers) and has edited 4 books including a book on Computer Performance Evaluation and Benchmarking. She holds 8 US patents and is a Fellow of IEEE.

2. Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop

Professor Dhabaleswar K. (DK) Panda

The Ohio State University

http://www.cse.ohio-state.edu/~panda/

Abstract:

The Hadoop framework has become one of the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high performance interconnects. This talk will examine opportunities and challenges in accelerating Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects. The talk will start with an overview of the RDMA for Apache Hadoop project (http://hadoop-rdma.cse.ohio-state.edu). Then, high-performance designs using RDMA to accelerate the Hadoop framework on InfiniBand and RoCE clusters will be demonstrated. Specific designs and case-studies to accelerate multiple components of Hadoop (such as HDFS, MapReduce, RPC and HBase) will be presented. In-depth performance results and their trends using a range of low-level micro-benchmarks (OSU Hadoop Micro-Benchmarks), higher-level benchmarks from BigDataBench/PUMA/SWIM suites will be presented.

Bioigraphy:

Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. His research interests include parallel computer architecture, high performance networking, InfiniBand, exascale computing, Big Data, programming models, GPUs and accelerators, high performance file systems and storage, virtualization, and cloud computing. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,100 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 202,000 downloads of this software have taken place from the project’s website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new RDMA for Apache Hadoop package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

3. Resource Efficient Cloud Computing

Professor Christos Kozyrakis

Stanford University

http://csl.stanford.edu/~christos/

Abstract:

Cloud computing promises flexibility, high performance, and cost efficiency for both users and operators. Nevertheless, users frequently experience high variability in performance and most cloud facilities operate at very low utilization, greatly reducing their cost effectiveness. The primary sources of low utilization and performance jitter are the poor understanding of the relationship between resource usage and performance, the interference that makes it difficult to share resources between workloads, and the hardware heterogeneity in modern datacenters. Moreover, neither user-level applications nor the system software stack are structured in a manner that helps alleviates these issues. This talk will discuss techniques to improve resource efficiency in cloud facilities. We will focus primarily on cluster management challenges such as resource provisioning and allocation, workload co-scheduling, and cluster-level energy management. We will also discuss briefly how resource efficiency interacts with benchmarking and workload evaluation.

Bioigraphy:

Christos Kozyrakis is an Associate Professor of Electrical Engineering & Computer Science at Stanford University. He works on architectures, runtime environments, and programming models for parallel computing systems. At Berkeley, he developed the IRAM architecture, a novel media-processor system that combined vector processing with embedded DRAM technology. At Stanford, he co-led the Transactional Coherence and Consistency (TCC) project at Stanford that developed hardware and software mechanisms for programming with transactional memory. He also led the Raksha project, that developed practical hardware support and security policies to deter high-level and low-level security attacks against deployed software. Dr. Kozyrakis is currently working on hardware and software techniques for resource efficient cloud computing. He is also a member of the Pervasive Parallelism Lab at Stanford, a multi-faculty effort to make parallel computing practical for the masses.

Christos received a BS degree from the University of Crete (Greece) and a PhD degree from the University of California at Berkeley (USA), both in Computer Science. He is the Willard R. and Inez Kerr Bell faculty scholar at Stanford and a senior member of the ACM and the IEEE. Christos has received the NSF Career Award, an IBM Faculty Award, the Okawa Foundation Research Grant, and a Noyce Family Faculty Scholarship..

4. Power Technology For a Smarter Future

DrJeff Stuecheli

STSM POWER Systems Hardware Architect

http://www.linkedin.com/pub/jeff-stuecheli/2/664/a0a

Abstract:

TBA.

Bioigraphy:

Jeff is a computer architect on the POWER line of systems. His primary area of focus is the development of performance enhancements of the memory subsystem. This includes caches, prefetch, networks, memory, and IO structures in the system. Day-to-day Jeff works closely with research peers, bringing new ideas and suggestions to the development organization. Thanks to his leadership in the design and development of POWER products, he has helped IBM clients around the world innovate in a myriad of commercial and scientific realms.

Previous event

Workshop	Dates	Location
BPOE-1	October 7, 2013	IEEE BigData Conference, San Jose, CA
BPOE-2	October 31,2013	CCF HPC China, Guilin, China
BPOE-3	December 5,2013	CCF Big Data Technology Conference 2013, BeiJing, China

Next event

Workshop	Dates		Location
BPOE-5	September 5, 2014		VLDB 2014, Hangzhou, Zhejiang Province, China

BPOE-4: The Fourth workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware

Introduction

Topics

Venue Information

Important dates

Organization

Program

Keynote Speakers

Previous event

Next event