Cymbalo

Many machine learning algorithms can be modeled as graphs naturally and solved by an iterative learning algorithms. For example, matrix factorization (MF) used in recommendation systems can be modeled as a user-item bipartite graph, of which each vertex is attached with a feature vector. The topic-modeling algorithm LDA can be represented as a document-word bipartite graph. Algorithms such as Logistic Regression (LR) and Support Vector Machine (SVM) can also be cast as a sample-feature graph model with the training sample data and features as the vertices. For ease of discussion, we name the graph systems designed for machine learning as ML-Graph. Compared to some regular graph algorithms, ML-Graph is much different on data structure and computational behaviors, since it aims at supporting both machine learning and graph algorithms, instead of just graph algorithms, like existing graph systems. The design of ML-Graphs should fully consider the inherent characteristics of machine learning algorithms, which are summarized as follows:

1.Heterogeneous Graph Structures: Graph analytics workloads such as PageRank often run on natural graphs and assume all the vertices to be homogeneous. ML-Graph, however, often consists of heterogeneous vertices, i.e., the user and item vertices in the user-item matrix of MF. The ML-Graph is thus separated into different vertex subsets. The heterogeneous properties of these graphs can be summarized into two aspects. First, the size of different subsets might be significantly skewed. For example, Wikipedia only has ten thousands of terms, but has more than four millions of articles. Second, the data types of vertices from different subsets might not be the same. Taking LR as an example, feature vertices maintain the weights of features, while the sample vertices need to maintain a sparse feature vector and a label value. Therefore, an efficient graph system should consider the heterogeneity of ML-Graphs.
2.Varied Computation Behaviors: The graph computation in different machine learning algorithms might involve varied propagation and update operations. For MF, each edge performs bidirectional propagation to send messages to its two neighboring vertices in each iteration. For LR, each feature vertex first propagates its value to all its related sample vertex. Each sample vertex then computes the gradients, and performs unidirectional propagation to send them to each feature. Moreover, there are also differentiated update behaviors between different types of vertices in one single algorithm, such as the sample and feature vertices in LR. Therefore, the computation model should be flexible to support various computation strategies.
3. Vector-centric Computation: Many machine learning algorithms can be transformed into vector operations, such as MF and LR algorithms. The programming model should take this into consider and provide vector-centric interfaces to simplify the programming of machine learning algorithms. Existing graph processing systems, however, support either vertex-centric or edge-centric programming model, but not both. When implementing a machine learning algorithm, users have to transform vector-based computations into graph computations on vertices or edges, which would be too complicated and time-consuming. Even TuX2, a latest distributed graph system designed for machine learning algorithms, provides MEGA programming model with only edge-centric interfaces for data exchange operations and vertex-centric interfaces for update operations. In contrast, a vector-centric programming model can free developers from low-level implementation details, and accelerate specific operations by leveraging the power of high-performance linear algebra libraries, such as openBLAS.

Considering the obvious difference between machine learning and classical graph computing, it takes delicate design to leverage the power of graph computing on machine learning problems. A new graph framework dedicated to machine learning is motivated.

Solution:

We present Cymbalo, a new distributed graph processing framework for machine learning algorithms, which can satisfy the three requirements. First, Cymbalo provides a heterogeneity-aware data model, which considers the various characteristics of ML-Graphs. This data model also considers a vector-aware graph layout to eliminate unnecessary memory footprint. Second, considering the diverse propagation behaviors of different machine learning algorithms, Cymbalo provides a hybrid computation model with differentiated computation strategies. Finally, Cymbalo provides a vector-centric programming model, which is more accurate and straightforward for machine learning algorithms

Cymbalo: An Efficient Graph Processing Framework for Machine Learning

Xinhui Tian?Biwei Xie and Jianfeng Zhan.The 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2018)

Abstract: Due to the growth of data scale, distributed machine learning has become more important than ever. Some recent work, like TuX2, show promising prospect in dealing with distributed machine learning by leveraging the power of graph computation, but still leave some key problems unsolved. In this paper, we propose Cymbalo, a new distributed graph processing framework for large-scale machine learning algorithms. To satisfy the specific characteristics of machine learning, Cymbalo employs a heterogeneity-aware data model, a hybrid computing model and a vector-aware programming model, to ensure small memory footprint, good computation efficiency and expressiveness. The experiment results show that Cymbalo outperforms Spark by 2.4X-3.2X, and PowerGraph by up to 5.8X. Moreover, Cymbalo can also outperform Angel, a recent parameter server system, by 1.6X-2.1X.

BenchCouncil: International Open Benchmark Council

Cymbalo