Hostname: page-component-89b8bd64d-rbxfs Total loading time: 0 Render date: 2026-05-13T15:19:04.359Z Has data issue: false hasContentIssue false

Fregel: a functional domain-specific language for vertex-centric large-scale graph processing

Published online by Cambridge University Press:  20 January 2022

HIDEYA IWASAKI
Affiliation:
The University of Electro-Communications, Tokyo, Japan (e-mail: iwasaki@cs.uec.ac.jp)
KENTO EMOTO
Affiliation:
Kyushu Institute of Technology, Fukuoka, Japan (e-mail: emoto@csn.kyutech.ac.jp)
AKIMASA MORIHATA
Affiliation:
The University of Tokyo, Tokyo, Japan (e-mail: morihata@graco.c.u-tokyo.ac.jp)
KIMINORI MATSUZAKI
Affiliation:
Kochi University of Technology, Kochi, Japan (e-mail: matsuzaki.kiminori@kochi-tech.ac.jp)
ZHENJIANG HU
Affiliation:
Peking University, Beijing, China (e-mail: huzj@pku.edu.cn)
Rights & Permissions [Opens in a new window]

Abstract

The vertex-centric programming model is now widely used for processing large graphs. User-defined vertex programs are executed in parallel over every vertex of a graph, but the imperative and explicit message-passing style of existing systems makes defining a vertex program unintuitive and difficult. This article presents Fregel, a purely functional domain-specific language for processing large graphs and describes its model, design, and implementation. Fregel is a subset of Haskell, so Haskell tools can be used to test and debug Fregel programs. The vertex-centric computation is abstracted using compositional programming that uses second-order functions on graphs provided by Fregel. A Fregel program can be compiled into imperative programs for use in the Giraph and Pregel+ vertex-centric frameworks. Fregel’s functional nature without side effects enables various transformations and optimizations during the compilation process. Thus, the programmer is freed from the burden of program optimization, which is manually done for existing imperative systems. Experimental results for typical examples demonstrated that the compiled code can be executed with reasonable and promising performance.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. Incomplete and naive Pregel-like code for all-reachability problem.

Figure 1

Fig. 2. First three supersteps of the naive program for all-reachability problem.

Figure 2

Fig. 3. Improved Pregel-like code for all-reachability problem.

Figure 3

Fig. 4. Pregel-like code for 100-reachability problem.

Figure 4

Fig. 5. Termination functions.

Figure 5

Fig. 6. Formulation of reachability problems in our model.

Figure 6

Fig. 7. Core part of Fregel syntax.

Figure 7

Fig. 8. Fregel programs for solving reachability problems.

Figure 8

Fig. 9. Fregel program for calculating diameter.

Figure 9

Fig. 10. Fregel program for solving reachability with ranking problem.

Figure 10

Fig. 11. Fregel program for solving strongly connected components problem.

Figure 11

Fig. 12. Haskell implementation of Fregel.

Figure 12

Fig. 13. Compilation flow of Fregel program.

Figure 13

Fig. 14. Pseudocode of normalized Fregel program for calculating diameter.

Figure 14

Fig. 15. Template of normalized Fregel program.

Figure 15

Fig. 16. Normalized Fregel program for scc.

Figure 16

Fig. 17. Simplified type definitions of FregelIR in Haskell.

Figure 17

Fig. 18. Simplified type definitions of FregelIR in Haskell, continued.

Figure 18

Fig. 19. Generating framework dependent code.

Figure 19

Fig. 20. Target program for optimization.

Figure 20

Table 1. Step functions for all-reachability and sssp problems

Figure 21

Fig. 21. sssp program for eliminating redundant communications.

Figure 22

Fig. 22. Optimizations in compilation flow of Fregel program.

Figure 23

Table 2. Execution times of sssp with 4–64 worker processes on Giraph (in seconds)

Figure 24

Table 3. Execution times of reAll with 4–64 worker processes on Giraph (in seconds)

Figure 25

Table 4. Execution times of re100 with 4–64 worker processes on Giraph (in seconds)

Figure 26

Table 5. Execution times of reRanking with 4–64 worker processes on Giraph (in seconds)

Figure 27

Table 6. Execution times of diameter with 4–64 worker processes on Giraph (in seconds)

Figure 28

Table 7. Execution times of scc with 4–64 worker processes on Giraph (in seconds)

Figure 29

Fig. 23. Computation times compared with that for handwc with four worker processes on Giraph.

Figure 30

Fig. 24. Computation times compared with that for handwc with 64 worker processes on Giraph.

Figure 31

Fig. 25. Parallel performance of sssp on Giraph.

Figure 32

Fig. 26. Parallel performance of reAll on Giraph.

Figure 33

Fig. 27. Parallel performance of re100 on Giraph.

Figure 34

Fig. 28. Parallel performance of reRanking on Giraph.

Figure 35

Fig. 29. Parallel performance of diameter on Giraph.

Figure 36

Fig. 30. Parallel performance of scc on Giraph.

Figure 37

Table 8. Memory footprint (bytes) of vertex’s data fields in Giraph programs, excluding fields defined in the base class of vertices

Figure 38

Table 9. Execution times of sssp with 4–64 worker processes on Pregel+ (in seconds)

Figure 39

Table 10. Execution times of reAll with 4–64 worker processes on Pregel+ (in seconds)

Figure 40

Table 11. Execution times of re100 with 4–64 worker processes on Pregel+ (in seconds)

Figure 41

Table 12. Execution times of reRanking with 4–64 worker processes on Pregel+ (in seconds)

Figure 42

Table 13. Execution times of diameter with 4–64 worker processes on Pregel+ (in seconds)

Figure 43

Table 14. Execution times of scc with 4–64 worker processes on Pregel+ (in seconds)

Figure 44

Fig. 31. Computation times compared with that for handwc with four worker processes on Pregel$+$.

Figure 45

Fig. 32. Computation times compared with that for handwc with 64 worker processes on Pregel$+$.

Figure 46

Fig. 33. Parallel performance of sssp on Pregel$+$.

Figure 47

Fig. 34. Parallel performance of reAll on Pregel$+$.

Figure 48

Fig. 35. Parallel performance of re100 on Pregel$+$.

Figure 49

Fig. 36. Parallel performance of reRanking on Pregel$+$.

Figure 50

Fig. 37. arallel performance of diameter on Pregel$+$.

Figure 51

Fig. 38. Parallel performance of scc on Pregel$+$.

Figure 52

Table 15. Memory footprint (bytes) of vertex data fields for Pregel+ programs, excluding fields defined in base class of vertices

Figure 53

Table 16. Maximum memory consumption (MB) of a worker process for Pregel+ programs forws20m2

Submit a response

Discussions

No Discussions have been published for this article.