Introduction
Big data has emerged as a strategic property of nations and organizations. There are driving needs to generate values from big data. However, the sheer volume of big data requires significant storage capacity, transmission bandwidth, computations, and power consumption. It is expected that systems with unprecedented scales can resolve the problems caused by varieties of big data with daunting volumes. Nevertheless, it is very difficult for big data owners to make choice on which system is most suited for their specific requirements. They also face challenges on how to optimize the systems and their solutions. Meanwhile, system researchers are working on new hardware architecture, operating systems, programming systems, and data management systems to improve performance in dealing with big data.
This workshop, the fourth in its series, aims at bringing researchers and practitioners in related areas together to discuss the research issues at the intersection of these areas, and also to draw much attention from architecture, systems, programming, and data management research communities to this new and highly promising field.
Topics
The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Early stage work, new ideas, unconventional approaches are encouraged. Specific topics of interest include but are not limited to:
- Big data workload characterization and benchmarking
- Innovative computer and memory architecture for big data
- Emerging hardware technologies in big data systems
- Innovative operating systems and programming systems for big data
- Interactions among architecture, systems and data management
- Performance analysis and optimization of big data systems
- Innovative prototypes of big data infrastructures
- Practice report of evaluating and optimizing large-scale big data systems
Papers should present original research. As big data spans many disciplines, papers should provide sufficient background material to make them accessible to the broader community.
Venue Information
Salt Lake City, Utah, USA
Important dates
Submission Deadline: Extended to January 14, 2014 (11:59pm PST)
Author Notification: January 25, 2014
Final Copy Due: February 17, 2014
Workshop: Saturday, March 1, 2014
Submission Web page: https://www.easychair.org/conferences/?conf=bpoe4
Organization
Steering committee:
- Christos Kozyrakis, Stanford
- Xiaofang Zhou, University of Queensland
- Dhabaleswar K Panda, Ohio State University
- Aoying Zhou, East China Normal University
- Raghunath Nambiar, Cisco
- Lizy K John, University of Texas at Austin
- Xiaoyong Du, Renmin University of China
- Ippokratis Pandis, IBM Almaden Research Center
- Xueqi Cheng, ICT, Chinese Academy of Sciences
- Bill Jia, Facebook
- Lidong Zhou, Microsoft Research Asia
- H. Peter Hofstee, IBM Austin Research Laboratory
- Haibo Chen, Shanghai Jiaotong University
- Alexandros Labrinidis, University of Pittsburgh
- Cheng-Zhong Xu, Wayne State University
- Jianfeng Zhan, ICT, Chinese Academy of Sciences
Program Chair:
- Jianfeng Zhan, ICT, Chinese Academy of Sciences
- Chuliang Weng, Shannon (IT) Lab, Huawei
Web Chair:
- Lei Wang, ICT, Chinese Academy of Sciences
Publicity Chairs:
- Yuqing Zhu (Data management), ICT, CAS
- Gang Lu (Operating systems), ICT, CAS
- Zhen Jia (Architecture), ICT, CAS
Program Committee
- Onur Mutlu, Carnegie Mellon University
- Xu Liu, Rice University
- Yunquan Zhang, ICT, Chinese Academy of Sciences
- Meikel Poess, Oracle Corporation
- Dejun Jiang, ICT, Chinese Academy of Sciences
- Yueguo Chen, Renmin University
- Rene Mueller, IBM
- Xiaoyi Lu, Ohio State University
- Yongqiang He, Dropbox
- Edwin Sha, University of Texas at Dallas
- Kun Wang, IBM Research China
- Rong Chen, Shanghai Jiaotong University
- Jens Teubner, Tu Dortmund University
- Yinliang Yue, ICT, Chinese Academy of Sciences
- Mauricio Breternitz, AMD Research
- Seetharami Seelam, IBM
- Zhenyu Guo, MSRA
- Farhan Tauheed, EPFL
- Gansha Wu, Intel
- Bingsheng He, Nanyang Technological University
- Zhibin Yu, SIAT, Chinese Academy of Sciences
- Lei Wang, ICT, Chinese Academy of Sciences
- Yuanchun Zhou, CNIC, Chinese Academy of Sciences
- Tilmann Rabl, University of Toronto
- Weijia Xu, TACC, University of Texas at Austin
- Mingyu Chen, ICT, Chinese Academy of Sciences
- Jian Ouyang, Baidu
- Wentao Qu, Google, US
PAPER SUBMISSION
Online submission site: https://www.easychair.org/conferences/?conf=bpoe4
Papers must be submitted in PDF, and be no more than 6 pages in standard two-column SIGPLAN conference format including figures and tables but not including references. The submissions will be judged based on the merit of the ideas rather than the length. Submissions must be made through the on-line submission site. Final papers and presentations will be accessible from the workshop website, but to facilitate resubmission to more formal venues, no archival proceedings will be published, and papers will not be sent to the ACM Digital Library. After proceeding, revised papers will be published by Springer LNCS (www.springer.com/lncs, indexed by EI).
Program
March 1, 2014
Morning
Session chair: Professor Jianfeng Zhan
Time | Topic | People | |
9:00-9:05 | Opening remark[.PPT] | Professor Jianfeng Zhan | |
9:05-10:00 | Keynote presentation: Big Data Workloads: An Architect’s Perspective[.PPT] | Professor Lizy K John | University of Texas at Austin |
10:00-10:30 | Morning break | ||
10:30-11:25 | Keynote presentation: Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop[.PDF] | Professor Dhabaleswar K. (DK) Panda | The Ohio State University |
11:25-11:45 | On Big Data Benchmarking[PDF] | Rui Han and Xiaoyi Lu | |
11:45-12:05 | Performance Benefits of DataMPI: A Case Study with BigDataBench[PDF] | Liang Fan, Feng Chen, Lu Xiaoyi and XuZhiwei |
Afternoon
Session chair: Dr. ChuliangWeng
Time | Topic | People | |
14:00-14:55 | Keynote presentation: Resource Efficient Cloud Computing[PDF] |
Professor Christos Kozyrakis | Stanford |
15:00-15:30 | Afternoon break | ||
15:30-16:25 | Keynote presentation: Power Technology For a Smarter Future | Dr. Jeff Stuecheli | IBM |
16:25-16:45 | Exploring Opportunities for Non-Volatile Memories in Big Data Applications[PDF] | Wei Wei, Dejun Jiang, Jin Xiong and Mingyu Chen | |
16:45-17:05 | Tuning Hadoop map slot value using CPU metric[PDF] | Kamal Kc and Vincent Freeh | |
17:05-17:25 | I/O Characterization of Big Data Workloads in Data Centers[PDF] | Fengfeng Pan, YinliangYue and Jin Xiong | |
17:25-17:45 | Characterizing Workload of Web Applications on Virtualized Servers[PDF] | Xiajun Wang, Song Huang, Song Fu and Krishna Kavi | |
17:45-18:00 | Benchmarking Trajectory Data for Trip Recommendation[PDF] | Kuien Liu, Yaguang Li, Shuo Shang and Kai Zheng | |
18:00-18:05 | Closing remark | Dr. ChuliangWeng |
Keynote Speakers
1. Big Data Workloads: An Architect’s Perspective
Professor Lizy Kurian John
University of Texas at Austin
http://users.ece.utexas.edu/~ljohn/
Abstract:
Much of the modern big data is unstructured data as opposed to traditional table-based structured data. While the traditional relational data base may not disappear for years to come, it is clear that other computing paradigms processing unstructured (and semi-structured) data will gain momentum in the light of the nature of the emerging big data. Applications that perform advanced analytics and machine learning on graphs will become more prevalent. Analysis of web-scale graphs from social networks will become important due to commercial significance. Understanding of emerging big data computing workloads is required in order to drive hardware and software development. Several questions need to be answered in order to design appropriate computing systems to support these applications in a performance, power and energy-efficient fashion. It would be interesting to analyze communication patterns in these applications and study the relevance for mechanisms supporting localized communications (fog computing as opposed to cloud). This talk will describe our ongoing research to investigating these and like questions. Efforts in workload characterization and big-data performance evaluation will be described.
Bioigraphy:
Lizy Kurian John is B. N. Gafford Professor in the Electrical and Computer Engineering at UT Austin. She received her Ph. D in Computer Engineering from the Pennsylvania State University. Her research interests include workload characterization, performance evaluation, benchmarking, and high performance processor and memory architectures for emerging workloads. She is recipient of NSF CAREER award, UT Austin Engineering Foundation Faculty Award, Halliburton, Brown and Root Engineering Foundation Young Faculty Award , University of Texas Alumni Association (Texas Exes) Teaching Award, The Pennsylvania State University Outstanding Engineering Alumnus, etc. She has coauthored a book on Digital Systems Design using VHDL (Thomson Publishers) and has edited 4 books including a book on Computer Performance Evaluation and Benchmarking. She holds 8 US patents and is a Fellow of IEEE.
2. Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop
Professor Dhabaleswar K. (DK) Panda
The Ohio State University
http://www.cse.ohio-state.edu/~panda/
Abstract:
The Hadoop framework has become one of the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high performance interconnects. This talk will examine opportunities and challenges in accelerating Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects. The talk will start with an overview of the RDMA for Apache Hadoop project (http://hadoop-rdma.cse.ohio-state.edu). Then, high-performance designs using RDMA to accelerate the Hadoop framework on InfiniBand and RoCE clusters will be demonstrated. Specific designs and case-studies to accelerate multiple components of Hadoop (such as HDFS, MapReduce, RPC and HBase) will be presented. In-depth performance results and their trends using a range of low-level micro-benchmarks (OSU Hadoop Micro-Benchmarks), higher-level benchmarks from BigDataBench/PUMA/SWIM suites will be presented.
Bioigraphy:
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. His research interests include parallel computer architecture, high performance networking, InfiniBand, exascale computing, Big Data, programming models, GPUs and accelerators, high performance file systems and storage, virtualization, and cloud computing. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,100 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 202,000 downloads of this software have taken place from the project’s website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new RDMA for Apache Hadoop package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.
3. Resource Efficient Cloud Computing
Professor Christos Kozyrakis
Stanford University
http://csl.stanford.edu/~christos/
Abstract:
Cloud computing promises flexibility, high performance, and cost efficiency for both users and operators. Nevertheless, users frequently experience high variability in performance and most cloud facilities operate at very low utilization, greatly reducing their cost effectiveness. The primary sources of low utilization and performance jitter are the poor understanding of the relationship between resource usage and performance, the interference that makes it difficult to share resources between workloads, and the hardware heterogeneity in modern datacenters. Moreover, neither user-level applications nor the system software stack are structured in a manner that helps alleviates these issues. This talk will discuss techniques to improve resource efficiency in cloud facilities. We will focus primarily on cluster management challenges such as resource provisioning and allocation, workload co-scheduling, and cluster-level energy management. We will also discuss briefly how resource efficiency interacts with benchmarking and workload evaluation.
Bioigraphy:
Christos Kozyrakis is an Associate Professor of Electrical Engineering & Computer Science at Stanford University. He works on architectures, runtime environments, and programming models for parallel computing systems. At Berkeley, he developed the IRAM architecture, a novel media-processor system that combined vector processing with embedded DRAM technology. At Stanford, he co-led the Transactional Coherence and Consistency (TCC) project at Stanford that developed hardware and software mechanisms for programming with transactional memory. He also led the Raksha project, that developed practical hardware support and security policies to deter high-level and low-level security attacks against deployed software. Dr. Kozyrakis is currently working on hardware and software techniques for resource efficient cloud computing. He is also a member of the Pervasive Parallelism Lab at Stanford, a multi-faculty effort to make parallel computing practical for the masses.
Christos received a BS degree from the University of Crete (Greece) and a PhD degree from the University of California at Berkeley (USA), both in Computer Science. He is the Willard R. and Inez Kerr Bell faculty scholar at Stanford and a senior member of the ACM and the IEEE. Christos has received the NSF Career Award, an IBM Faculty Award, the Okawa Foundation Research Grant, and a Noyce Family Faculty Scholarship..
4. Power Technology For a Smarter Future
DrJeff Stuecheli
STSM POWER Systems Hardware Architect
http://www.linkedin.com/pub/jeff-stuecheli/2/664/a0a
Abstract:
TBA.
Bioigraphy:
Jeff is a computer architect on the POWER line of systems. His primary area of focus is the development of performance enhancements of the memory subsystem. This includes caches, prefetch, networks, memory, and IO structures in the system. Day-to-day Jeff works closely with research peers, bringing new ideas and suggestions to the development organization. Thanks to his leadership in the design and development of POWER products, he has helped IBM clients around the world innovate in a myriad of commercial and scientific realms.
Previous event
Workshop |
Dates |
Location |
BPOE-1 | October 7, 2013 | IEEE BigData Conference, San Jose, CA |
BPOE-2 | October 31,2013 | CCF HPC China, Guilin, China |
BPOE-3 | December 5,2013 | CCF Big Data Technology Conference 2013, BeiJing, China |
Next event
Workshop | Dates |
Location |
|||
BPOE-5 | September 5, 2014 | VLDB 2014, Hangzhou, Zhejiang Province, China |