Hostname: page-component-77f85d65b8-45ctf Total loading time: 0 Render date: 2026-04-18T05:43:01.990Z Has data issue: false hasContentIssue false

A data-driven method for automated data superposition with applications in soft matter science

Published online by Cambridge University Press:  25 May 2023

Kyle R. Lennon
Affiliation:
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Gareth H. McKinley*
Affiliation:
Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
James W. Swan
Affiliation:
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
*
Corresponding author: Gareth H. McKinley; Email: gareth@mit.edu

Abstract

The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Example of fictitious measurements of the concentration profile of a diffusing species from an instantaneous point source. (a) The concentration profile was measured instantaneously at four different times, with lighter-shaded curves representing later times. (b) A “master curve” constructed by rescaling the width and height of the concentration profiles by time-dependent “shift factors” $ a(t) $ and $ b(t) $, respectively. (c) The shift factors $ a(t) $ and $ b(t) $ plotted as a function of time. The earliest-time concentration profile is taken as the reference, so its shift factors are unity. The remaining shift factors exhibit the trends $ a(t)\sim {t}^{-1/2} $ and $ b(t)\sim {t}^{1/2} $.

Figure 1

Figure 2. Automated superposition of recoverable creep compliance data acquired for a polystyrene melt at different temperatures. (a) The recoverable creep compliance data in laboratory time (circles), for temperatures of $ T=97.0 $, $ 100.6 $, $ 101.8 $, $ 104.5 $, $ 106.7 $, $ 109.5 $, $ 114.5 $, $ 125.0 $, $ 133.8 $, and $ 144.9{}^{\circ}\mathrm{C} $, shown vertically in increasing order (digitized by Plazek, 1965). Solid lines and shaded regions show the posterior mean $ m(t) $ and uncertainty bounds corresponding to one standard deviation, $ m(t)\pm s(t) $, determined via Gaussian process regression. (b) Automatically constructed master curve using horizontal shifting, with a reference temperature of $ {T}_0=100{}^{\circ}\mathrm{C} $. (c) The recoverable creep compliance with added 20% relative Gaussian white noise, along with the associated posterior mean $ m(t) $ and one standard deviation bounds $ m(t)\pm s(t) $ determined by Gaussian process regression. (d) Automatically constructed master curve using horizontal shifting with 20% relative Gaussian white noise added to the raw data.

Figure 2

Figure 3. Automated superposition of mean-squared displacements measured by particle tracking passive microrheology in a peptide hydrogel undergoing physical gelation (Larsen and Furst, 2008). (a) The mean squared displacement $ \left\langle \Delta {r}^2\left(\tau \right)\right\rangle $ as a function of lag-time $ \tau $ (circles). Data were obtained for cure times $ t\in \left[\mathrm{10,115}\right] $ minutes at 5 min increments, though only data at 10 min increments are shown here, presented in vertically descending order. Solid lines and shaded regions show the posterior mean $ m\left(\tau \right) $ and uncertainty bounds corresponding to one standard deviation, $ m\left(\tau \right)\pm s\left(\tau \right) $, determined via Gaussian process regression. (b) Maximum a posteriori estimates of the horizontal (red circles) and vertical (blue triangles) shift factors. Lightly shaded points and lines represent values inferred under a uniform prior. Darker shaded points and lines represent values inferred with a Gaussian prior over $ a(t) $ with hyperparameters optimized pairwise using Monte Carlo cross-validation. The shifts are optimally partitioned into a pregel curve and postgel curve. (c) Pregel master curve obtained with an optimized Gaussian prior over the shift factors $ a(t) $. (d) Postgel master curve obtained with an optimized Gaussian prior over $ a(t) $.

Figure 3

Figure 4. Automated superposition of steady flow data for castor oil-in-water suspensions with varying oil volume fraction. (a) The steady shear stress $ \sigma $ measured over a range of steady shear rates $ \dot{\gamma} $ (circles), for oil volume fractions of $ \phi $ = 0.68, 0.70, 0.72, 0.74, 0.76, 0.78, and 0.80, shown vertically in increasing order (digitized by Dekker et al., 2018). Solid lines and shaded regions show the posterior mean $ m\left(\dot{\gamma}\right) $ and uncertainty bounds corresponding to one standard deviation, $ m\left(\dot{\gamma}\right)\pm s\left(\dot{\gamma}\right) $, determined via Gaussian process regression. (b) Automatically constructed master curve using subtraction of a state-independent purely viscous contribution to the stress with $ {\eta}_{bg}=4.5\times {10}^{-2} $ Pa$ \cdot $s, followed by horizontal and vertical shifting. The reference state is taken as $ \phi =0.68 $ with $ {\sigma}_y=4.7 $ Pa and $ {\dot{\gamma}}_c=4.7 $ s$ {}^{-1} $. Inset shows the optimal value and estimated uncertainty of $ {\eta}_{bg} $ inferred by applying the method to an increasing number of flow curves from (a), averaged over all possible combinations.

Figure 4

Table 1. Manually and automatically inferred horizontal shift factors for the recoverable creep compliance of an entangled polystyrene melt (Plazek, 1965).

Figure 5

Figure 5. The critical shear rate and yield stress values for castor oil-in-water emulsions inferred by the automated algorithm with the maximum a posteriori estimate of $ {\eta}_{bg}=4.5\times {10}^{-2} $ Pa$ \cdot $s (filled symbols), and with $ {\eta}_{bg}=3.7\times {10}^{-2} $ Pa$ \cdot $s (unfilled symbols), as well as values fit via the three-component (TC) model (dashed lines) (Caggioni et al., 2020). Vertical bars depict one standard error in the estimates.

Figure 6

Figure 6. Automated superposition of stress relaxation data for a physically aging suspension of Laponite-RD clay with varying wait time $ {t}_w $ between mixing and preshearing, and imposition of a step strain with $ {\gamma}_0=0.03 $ (which is always in the linear viscoelastic range). (a) The relaxation modulus $ G\left(t-{t}_w;{t}_w\right) $ as a function of the time since the step strain, is $ t-{t}_w $ parametric in the wait time $ {t}_w $. Data (circles) are shown for $ {t}_w $ = 600 s, 1,200 s, 1,800 s, 2,400 s, and 3,600 s, shown vertically in increasing order (digitized by Gupta et al., 2012). Solid lines and shaded regions show the posterior mean $ m\left(t-{t}_w\right) $ and uncertainty bounds corresponding to one standard deviation, $ m\left(t-{t}_w\right)\pm s\left(t-{t}_w\right) $, determined via Gaussian process regression. (b) Automatically constructed master curve using transformation to the effective or material time interval $ \tilde{t}/{t}_{\mathrm{ref}}=\left[{\left(t/{t}_{\mathrm{ref}}\right)}^{1-\nu }-{\left({t}_w/{t}_{\mathrm{ref}}\right)}^{1-\nu}\right]/\left(1-\nu \right) $ with $ \nu =1.1 $, and subsequent vertical multiplicative shifting by a factor $ b\left({t}_w\right) $, with a reference wait time of $ {t}_{\mathrm{ref}} $ = 600 s. The vertical shift factors $ b\left({t}_w\right) $ are shown in the inset, with vertical bars representing one-standard-error uncertainty estimates.

Figure 7

Figure 7. Forward predictions of the steady flow curve for a castor oil-in-water emulsion with oil volume fraction $ \phi =0.72 $. The value of $ {\eta}_{bg}=\left(4.5\pm 0.2\right)\times {10}^{-2} $ Pa$ \cdot $s is inferred during construction of a master curve from the remaining data sets ($ \phi $ = 0.68, 0.70, 0.74, 0.76, 0.78, 0.80). The yield stress $ {\sigma}_y=9\pm 1 $ Pa and critical shear rate $ {\dot{\gamma}}_c=6.7\pm 0.1 $ s−1 are estimated from Gaussian process models fit to the automatically inferred shift factors at the remaining states. The inferred master curve is fit to a Gaussian process model, whose predictions are shifted to the $ \phi =0.72 $ state using the predicted values of $ {\sigma}_y $, $ {\dot{\gamma}}_c $, and $ {\eta}_{bg} $. The mean predicted values are shown with a solid line, and single standard deviation uncertainty bounds are shown with a shaded region. Experimental data for $ \phi =0.72 $ from Dekker et al. (2018) are shown by the solid circles.

Figure 8

Table A1. The shift factors inferred by the closed-form shifting (CFS) and minimum arc length (min-$ \delta $) algorithms for time–temperature superposition of recoverable creep compliance data for a polystyrene melt (Plazek, 1965), both without and with 20% relative Gaussian white noise added to the raw compliance data.

Figure 9

Table A2. The relative deviation $ \mid \Delta {a}_T\mid /{a}_T^{(0)}=\mid {a}_T^{\left(\sigma \right)}-{a}_T^{(0)}\mid /{a}_T^{(0)} $ in the shift factors computed with 20% relative Gaussian white noise ($ {a}_T^{\left(\sigma \right)} $) and without noise ($ {a}_T^{(0)} $), relative to the noiseless values, for the closed-form shifting (CFS) and minimum arc length (min-$ \delta $) algorithms, compared to the shifts computed by the algorithm presented in this work.

Submit a response

Comments

No Comments have been published for this article.