Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-04-19T13:21:27.293Z Has data issue: false hasContentIssue false

Iterating on multiple collections in synchrony

Published online by Cambridge University Press:  05 July 2022

STEFANO PERNA
Affiliation:
Department of Computer Science, National University of Singapore, Singapore (e-mail: stefano.perna@ntu.edu.sg)
VAL TANNEN
Affiliation:
Department of Computer and Information Science, University of Pennsylvania, USA (e-mail: val@cis.upenn.edu)
LIMSOON WONG
Affiliation:
Department of Computer Science, National University of Singapore, Singapore (e-mail: wongls@comp.nus.edu.sg)
Rights & Permissions [Opens in a new window]

Abstract

Modern programming languages typically provide some form of comprehension syntax which renders programs manipulating collection types more readable and understandable. However, comprehension syntax corresponds to nested loops in general. There is no simple way of using it to express efficient general synchronized iterations on multiple ordered collections, such as linear-time algorithms for low-selectivity database joins. Synchrony fold is proposed here as a novel characterization of synchronized iteration. Central to this characterization is a monotonic isBefore predicate for relating the orderings on the two collections being iterated on and an antimonotonic canSee predicate for identifying matching pairs in the two collections to synchronize and act on. A restriction is then placed on Synchrony fold, cutting its extensional expressive power to match that of comprehension syntax, giving us Synchrony generator. Synchrony generator retains sufficient intensional expressive power for expressing efficient synchronized iteration on ordered collections. In particular, it is proved to be a natural generalization of the database merge join algorithm, extending the latter to more general database joins. Finally, Synchrony iterator is derived from Synchrony generator as a novel form of iterator. While Synchrony iterator has the same extensional and intensional expressive power as Synchrony generator, the former is better dovetailed with comprehension syntax. Thereby, algorithms requiring synchronized iterations on multiple ordered collections, including those for efficient general database joins, become expressible naturally in comprehension syntax.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Fig. 1. A motivating example. The functions ov1(xs, ys) and ov2(xs, ys) are equal on inputs xs and ys which are sorted lexicographically by their start and end point. While ov1(xs, ys) has quadratic time complexity $O(|\mathtt{xs}| \cdot |\mathtt{ys}|)$, ov2(xs, ys) has time complexity $O(|\mathtt{xs}| + k |\mathtt{ys}|)$ when each event in ys overlaps fewer than k events in xs.

Figure 1

Fig. 2. The Gaifman graph for an object $C = (c_1, c_2,\{ (c_3, c_4), (c_5, c_6) \}, \{ (c_3, c_7, c_8), (c_4, c_8, c_6) \})$. This object C has two relations or level-0 molecules, underlined respectively in red and green in the figure; and two level-0 atoms, underlined in blue in the figure. An edge connecting two nodes indicates the two nodes are in the same tuple. The color of an edge is not part of the definition of a Gaifman graph, but is used here to help visualize the relation or molecule containing the tuple contributing that edge.

Figure 2

Fig. 3. Visualization of monotonicity and antimonotonicity. Two collections xs and ys are sorted according to some orderings, as denoted by the two arrows. The isBefore predicate is represented by the relative horizontal positions of items x $_i$ and y $_j$; that is, if y $_j$ has a horizontal position to the left of x $_i$, then y $_j$ is before x $_i$. The canSee predicate is represented by the shaded green areas. (a) If y $_1$ is before x $_1$ and cannot see x $_1$, then y $_1$ is also before and cannot see any x $_2$ which comes after x $_1$. So, every x $_i$ that matches y $_1$ has been seen; it is safe to move forward to y $_2$. (b) If y $_1$ is not before x $_1$ and cannot see x $_1$, then any y $_2$ which comes after y $_1$ is also not before and cannot see x $_1$. So, every y $_j$ that matches x $_1$ has been seen; it is safe to move forward to x $_2$.

Figure 3

Fig. 4. Definitions of syncFold and slowFold. These two programs compute the same results when bf is monotonic with respect to $(\mathtt{xs}, \mathtt{ys})$ and cs is antimonotonic with respect to bf. However, syncFold is more efficient than slowFold.

Figure 4

Fig. 5. Definitions of syncFoldGrp and slowFoldGrp. They compute the same results when bf is monotonic with respect to $(\mathtt{xs}, \mathtt{ys})$ and cs is antimonotonic with respect to bf. However, syncFoldGrp is more efficient than slowFoldGrp.

Figure 5

Fig. 6. Definitions of syncMap, syncFlatMap, syncGen, and SyncGenGrp.

Figure 6

Fig. 7. A variation of the event-overlap example. ovWithId(xs, ys) computes the same function as the two SQL queries on inputs xs and ys which are sorted lexicographically by (id, start, end).

Figure 7

Fig. 8. The arranging-meeting example.

Figure 8

Fig. 9. Preliminary definition of EIterator, shown along side the unfolded definition of syncGenGrp. The syncedWith method of the former is derived from the aux function of the latter.

Figure 9

Fig. 10. The arranging-meeting example expressed using Synchrony iterator.

Figure 10

Fig. 11. Revised definition of Synchrony iterator EIterator and its derivative EIteratorWithKey, whose isBefore predicate (bfk) and canSee predicate (csk) are defined using sorting keys (keya, keyb).

Figure 11

Fig. 12. The arranging-meeting example revisited again. The program mtg4 is a rewrite of the program mtg3 from Figure 10 using the generator syntax suggested for Synchrony iterator.

Figure 12

Fig. 13. Performance of GMQL CLI and Synchrony emulation on simple region MAP. Time in seconds, average of 30 runs for SB and MB, and 5 runs for BB. Purple: GMQL CLI. Blue: Sequential Synchrony emulation. Green: Sample-parallel Synchrony emulation.

Figure 13

Fig. 14. Alternative attempts to define syncGenGrp. The function groups is only correct when cs is an equijoin predicate. The function groups2 is equivalent to syncGenGrp and has comparable efficiency.

Submit a response

Discussions

No Discussions have been published for this article.