Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-06T10:53:28.989Z Has data issue: false hasContentIssue false

Probabilistic Methods for Evaluating Human and LLMs During Design Problem-Solving

Published online by Cambridge University Press:  27 August 2025

Ryan Bruggeman*
Affiliation:
Northeastern University Boston, MA
Estefania Ciliotta Chehade
Affiliation:
Northeastern University Boston, MA
Tucker Marion
Affiliation:
Northeastern University Boston, MA
Paolo Ciuccarelli
Affiliation:
Northeastern University Boston, MA

Abstract:

We present a probabilistic method for assessing design reasoning in design problem settings using soundness and completeness as metrics. Building on how inference mechanisms are employed during latent need elicitation from product reviews, we compare human-led and Large Language Models (LLMs) via protocols, workshops, and surveys. We demonstrate that human reasoning patterns tend to leverage user opinions, achieving deeper coverage of need potential, whereas LLMs often produce narrower, categorically constrained needs. These findings highlight the importance of balancing inference mechanisms to ensure both coherent reasoning steps and comprehensive exploration of the design space. By formally framing reasoning during design problem-solving, we offer a foundation for developing design enabled AI and deepens our understanding of how complex reasoning unfolds in practice.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2025
Figure 0

Table 1. On the left: the average use of ASP , CAT , OP , SEN by each study group. We take the standard deviation (SD) for the. ${\mathscr S}$ of each participant in the group. To the right is the Kolmogorov-Smirnov test (KS) results comparing groups

Figure 1

Figure 1. (A) Example of a visual map a workshop designer made. (B) ECDF plot of annotations used by participant per group. (C) Soundness measure for each participant per group

Figure 2

Table 2. Rank results from the survey. Variance = Var