Hostname: page-component-77f85d65b8-g98kq Total loading time: 0 Render date: 2026-03-29T03:25:25.210Z Has data issue: false hasContentIssue false

A Machine Learning-Based Approach for Solving Recurrence Relations and Its use in Cost Analysis of Logic Programs

Published online by Cambridge University Press:  21 November 2024

LOUIS RUSTENHOLZ
Affiliation:
Technical University of Madrid (UPM), Madrid, Spain IMDEA Software Institute, Pozuelo de Alarcon, Madrid, Spain (e-mail: louis.rustenholz@imdea.org)
MAXIMILIANO KLEMEN
Affiliation:
Technical University of Madrid (UPM), Madrid, Spain IMDEA Software Institute, Pozuelo de Alarcon, Madrid, Spain (e-mail: maximiliano.klemen@imdea.org)
MIGUEL Á. CARREIRA-PERPIÑÁN
Affiliation:
University of California, Merced, CA, USA (e-mail: mcarreira-perpinan@ucmerced.edu)
PEDRO LOPEZ-GARCIA
Affiliation:
Spanish Council for Scientific Research, Madrid, Spain IMDEA Software Institute, Pozuelo de Alarcon, Madrid, Spain (e-mail: pedro.lopez@csic.es)
Rights & Permissions [Opens in a new window]

Abstract

Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and many for other languages), as CiaoPP, are based on setting up recurrence relations representing (bounds on) the computational cost of predicates and solving them to find closed-form functions. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with state-of-the-art solvers, including computer algebra systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine learning (sparse-linear and symbolic) regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check whether such function is actually a solution of the recurrence. Our prototype implementation and its experimental evaluation within the context of the CiaoPP system show quite promising results. Overall, for the considered benchmark set, our approach outperforms state-of-the-art cost analyzers and recurrence solvers and can find closed-form solutions, in a reasonable time, for recurrences that cannot be solved by them.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Fig 1. Architecture of our novel machine learning-based recurrence solver.

Figure 1

Algorithm 1. Candidate Solution Generation (Guesser).

Figure 2

Algorithm 2. Solution Checking (Checker).

Figure 3

Fig 2. A program with a nested recursion.

Figure 4

Table 1. Benchmarks

Figure 5

Table 2. Experimental evaluation and comparison

Figure 6

Fig 3. Comparison of solver tools by accuracy of the result.

Figure 7

Fig 4. Candidate functions for linear regression in dimension $\leq 2$. $\mathcal{F}_{\mathrm{small}}=S_{\mathrm{small}}$, $\mathcal{F}_{\mathrm{medium}}=\mathcal{F}_{\mathrm{small}}\cup S_{\mathrm{medium}}$, $\mathcal{F}_{\mathrm{large}}=\mathcal{F}_{\mathrm{medium}}\cup S_{\mathrm{large}}$. Lambdas omitted for conciseness.

Figure 8

Fig 5. Candidate functions for linear regression in dimension $\geq 3$. $\mathcal{F}_{s}$ is defined by combination of simpler base functions, as bounded products of applications of base functions to individual input variables. For example, with $m=3$, $\mathcal{F}_{\text{small}}=\{x,y,z,xy,xz,yz,xyz\}$.

Figure 9

Fig 6. Prolog encoding of the enqdeq benchmarks, inspired from Section 2.1 of Hoffmann (2011) introducing amortized analysis, where a queue datastructure is implemented as two lists that act as stacks. We encode the enqdeq problems for each tool following a best-effort approach. Recurrence equations are set up in terms of compositions of cost and size functions.

Figure 10

Table 3. Comparison of linear regression and symbolic linear equation solvers for coefficient search on the fib benchmark, with $22$ base functions and $n$ training points. Legend: underdetermined systems (und.), timeouts (T.O.), and out-of-memory errors (O.M.)