Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-03-29T19:57:13.181Z Has data issue: false hasContentIssue false

GPU Accelerated Compact-Table Propagation

Published online by Cambridge University Press:  28 August 2025

ENRICO SANTI
Affiliation:
DMIF, University of Udine, Udine, Italy (e-mails: santi.enrico@spes.uniud.it, agostino.dovier@uniud.it, andrea.formisano@uniud.it)
AGOSTINO DOVIER
Affiliation:
DMIF, University of Udine, Udine, Italy (e-mails: santi.enrico@spes.uniud.it, agostino.dovier@uniud.it, andrea.formisano@uniud.it)
ANDREA FORMISANO
Affiliation:
DMIF, University of Udine, Udine, Italy (e-mails: santi.enrico@spes.uniud.it, agostino.dovier@uniud.it, andrea.formisano@uniud.it)
FABIO TARDIVO
Affiliation:
Dept CS, New Mexico State University, Las Cruces, NM, USA (e-mail: ftardivo@nmsu.edu)
Rights & Permissions [Opens in a new window]

Abstract

Constraint Programming developed within Logic Programming in the Eighties; nowadays all Prolog systems encompass modules capable of handling constraint programming on finite domains demanding their solution to a constraint solver. This work focuses on a specific form of constraint, the so-called table constraint, used to specify conditions on the values of variables as an enumeration of alternative options. Since every condition on a set of finite domain variables can be ultimately expressed as a finite set of cases, Table can, in principle, simulate any other constraint. These characteristics make Table one of the most studied constraints ever, leading to a series of increasingly efficient propagation algorithms. Despite this, it is not uncommon to encounter real-world problems with hundreds or thousands of valid cases that are simply too many to be handled effectively with standard CPU-based approaches. In this paper, we deal with the Compact-Table (CT) algorithm, the state-of-the-art propagation algorithms for Table. We describe how CT can be enhanced by exploiting the massive computational power offered by modern Graphics Processing Units (GPUs) to handle large Table constraints. In particular, we report on the design and implementation of GPU-accelerated CT, on its integration into an existing constraint solver, and on an experimental validation performed on a significant set of instances.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1. A table constraint $c$ with 5 tuples on the variables $x_1,x_2,x_3$ with domain $\{1,2,3,4\}$ (a), the corresponding static $\mathit{supports}$ matrix (b), and a possible value of currTable (c)

Figure 1

Algorithm 1: enforceGAC()

Figure 2

Algorithm 2: updateTable()

Figure 3

Algorithm 3: filterDomains()

Figure 4

Fig 1. Overview of the parallel reduction done on $\_supportsT\_dev$, where $\_s\_val=\{x_1,x_3,\ldots \}$. Different colors indicate which words are accessed by the threads in the block. Arrows depict the direction of the parallel reductions and $\lceil \frac {t}{32} \rceil$ is a shorthand for $\_currTableSize$ (see Section 2).

Figure 5

Fig 2. Overview of the parallel reduction done on $\mathit{\_tmpMasks}$ by $\mathit{reduce}()$. Different colors of the blocks indicate which words are accessed by the threads in the block. Arrows show the direction of the parallel reductions while $\lceil \frac {t}{32} \rceil$ is a shorthand for $\_currTableSize$ (see Section 2).

Figure 6

Fig 3. Overview of the data each block considers in the kernel $\mathit{filterDomainsGPU}()$.

Figure 7

Fig 4. Overview of the execution flow of $CT_{CU}^{u}$. Kernels are depicted by green activities.

Figure 8

Table 2. Overview of table constraint features in the first test set

Figure 9

Fig 5. Barplot comparing serial and CUDA solve times for the OR instances. The grids considered are squared, the instance named $r\_C$ is an $OR$ instance with a $r\times r$ grid and profit bound $C$.

Figure 10

Fig 6. Barplot comparing serial and CUDA solve times for the first batch of instances. Test instances where all implementations timed out are omitted.

Figure 11

Fig 7. Barplot comparing serial and CUDA solve times for the second batch of instances. Test instances where all implementations timed out are omitted.

Figure 12

Fig 8. Series plots of the first 20 $CT$ and $CT^{uf}_{CU}$ propagation times for $\mathit{TEST\_B\_10}$ and $\mathit{TEST\_B\_9}$. The times are grouped for the update and filter procedures.

Figure 13

Fig 9. Series plot of the first 200 $CT$ and $CT^{uf}_{CU}$ propagation times for $\mathit{TEST\_EB\_19}$.

Figure 14

Table 3. The solving times (in seconds) for gecode and $CT^{uf}_{CU}$

Figure 15

Table 4. MiniZinc solve times, number of propagations and speedup for $CT^{f}_{CU}$ and $CT^{uf}_{CU}$ on the first batch of tests. A barplot summing up the table is presented in Figure 6

Figure 16

Table 5. MiniZinc solve times, number of propagations and speedup for $CT^{f}_{CU}$ and $CT^{uf}_{CU}$ on the second batch of tests. A barplot summing up the table is presented in Figure 7