Improving parenting, child attachment, and externalizing behaviors: Meta-analysis of the first 25 randomized controlled trials on the effects of Video-feedback Intervention to promote Positive Parenting and Sensitive Discipline

Abstract Improving parenting, child attachment, and externalizing behaviors: Meta-analysis of the first 25 randomized controlled trials on the effects of Video-feedback Intervention to promote Positive Parenting and Sensitive Discipline (VIPP-SD). VIPP-SD combines support of parental sensitive responsiveness with coaching parents in sensitive limit setting. Here, we present meta-analyses of 25 RCTs conducted with more than 2,000 parents and caregivers. Parents or children had various risks. We examined its effectiveness in promoting parental cognitions and behavior regarding sensitive parenting and limit setting, in promoting secure child–parent attachment, and reducing externalizing child behavior. Web of Science, MEDLINE, PubMed, and recent reviews were searched for relevant trials (until May 10, 2021). Multilevel meta-analysis with META, METAFOR, and DMETAR in R took account of the 3-level structure of the datasets (studies, participants, measures). The meta-analyses showed substantial combined effect sizes for parenting behavior (r = .18) and attitudes (r = .16), and for child attachment security (r = .23), but not for child externalizing behavior (r = .07). In the subset of studies examining effects on both parenting and attachment, the association between effect sizes for parenting and for attachment amounted to r = .48. We consider the way in which VIPP-SD uses video-feedback an active intervention component. Whether VIPP-SD indeed stimulates secure attachment through enhanced positive parenting remains an outstanding question for further experimental study and individual participant data meta-analysis.

Theory-based parenting interventions are the litmus test of causality and of the translational value of theories about parenting. In the areas of social learning theory and of attachment theory only a handful of parenting interventions have been developed and tested in more than a few RCTs. Prime examples in the social learning tradition are the Incredible Years program (Webster-Stratton, 2015), the parent management training oregon model (Fisher & Stoolmiller, 2008;Patterson et al., 2010), and parent-child interaction therapy (PCIT; Euser et al., 2015;Eyberg & Robinson, 1982). Originally, PCIT was strongly inspired by a social learning framework (Eyberg & Robinson, 1982), but it began to include components suggested by attachment theory, such as an emphasis on the child−parent attachment relationship (Allen et al., 2014). In the tradition of attachment theory, the attachment and biobehavioral catch-up program (Dozier & Bernard, 2017), infant-parent psychotherapy (Cicchetti, et al., 2006), the circle of security (Cassidy et al., 2011;Dehghani et al., 2014) and the group attachment-based intervention (Steele et al., 2019) might be mentioned (see Steele & Steele, 2018, for other attachment-based interventions). Developed and tested in more than 30 years on more than 2,000 families in 25 RCTs, the Video-feedback Intervention to promote Positive Parenting and Sensitive Discipline (VIPP-SD; Juffer et al., 2008) has been the product of integrating social learning and attachment theory (Bosmans et al., 2020(Bosmans et al., , 2021Juffer et al., 2017). VIPP-SD combines support of parental sensitive responsiveness (Ainsworth et al., 1974) with coaching parents to avoid coercive cycles (Patterson, 1982) and promote sensitive limit setting. Here we present a series of meta-analyses of the RCTs conducted with parents and caregivers of a variety of typically and atypically developing children in a broad age range using the suite of video-feedback intervention programs which can be labeled VIPP-SD (Juffer et al., 2017). The aim is to take stock of the evidence about its effectiveness, and to explore questions that may lend itself for hypothesis driven work through individual participant data meta-analysis (IPD; see Verhage et al., 2018Verhage et al., , 2020 and further experimental studies. Inspired by Ainsworth et al.'s (1974) concept of sensitive parenting, most variants of VIPP-SD aim at parental sensitivity. In usually four sessions, intervenors focus on (1) learning how to distinguish children's attachment behaviors from exploratory behaviors, (2) increasing awareness of and attention to subtle child signals by speaking for the child, (3) highlighting sensitive interaction chains consisting of three phases: the child's signal, the parent's sensitive response, and the child's reaction to that response, and (4) sharing emotions and attuning affect . To promote gentle but firm control and limit setting, parents are additionally supported in setting limits for and resolving conflicts with their toddler or preschooler. Inspired by Patterson's (1982) theory of coercive cycles, VIPP-SD targets the discipline component in parallel to the sensitivity themes as follows: (1) distraction and induction as non-coercive responses to difficult child behavior or conflict-evoking situations, (2) positive reinforcement (praising the child for positive behavior and ignoring negative attention seeking), (3) using a sensitive interaction pause to deescalate conflicts or temper tantrums, and (4) showing empathy for the child while using consistent discipline strategies and clear limit setting (Juffer et al., , 2017. In two booster sessions, all themes are repeated and integrated. Meta-analytically, the modest number of intervention sessions ("less is more") and the use of video-feedback were supported by evidence that these intervention modalities were positively associated with effect sizes in a broad range of interventions on parental sensitivity (Bakermans-Kranenburg et al., 2003).
In the original version of VIPP-SD, each video-feedback session starts with videotaping standardized caregiver-child interactions. For example, for addressing parental limit setting, video footage is needed of situations that elicit child challenging behavior and parental disciplinary actions. Mealtime turns out to be useful for this purpose, as well as "don't touch" and clean-up tasks during play in which the child has to follow unwelcome directions from the parents. This video material is used in the next session, with the feedback being prepared by the intervener in-between the sessions. Sessions last about 1-1.5 hr, and are usually home visits; this facilitates the transfer of trained skills to daily life and underscores the basic VIPP-SD idea of the parent being in charge with a visiting intervener in an empowering role. Three phases can be distinguished during the intervention trajectory. In the first phase (sessions 1 and 2), the intervener builds a working alliance with the parent, and the focus of the video feedback is on child behavior and signals, and on strengths of the parent. In the second phase (sessions 3 and 4), the intervener works actively on improving parenting behaviors by commenting on moments of effective parenting behavior as well as on incidents of ineffective parenting behavior, discussing alternatives for these moments while showing empathy for the parent and watching again video fragments where the parent's strategy was adequate and effective. In the final phase (sessions 5 and 6) all feedback and information from the previous sessions is repeated. Interveners reinforce positive parent-child interactions and effective parenting strategies, and parents are explicitly acknowledged as the experts on their own child. The intervener supports the parents or caregivers in reflecting on their own interactive images mirrored in the video-records (Bakermans-Kranenburg et al., 2019).
The intervention is thus standardized with respect to structure and themes of the session, but also personalized through ideographic video footage. Because the VIPP-SD is protocoled as well as personalized, it has a degree of plasticity that may allow successful implementation beyond the population and setting for which it was originally developed (see Table 1).
Adaptations of the program in some of the studies pertained to the content of the videotaped parent−child interactions, such as a tea ceremony as cultural adaptation with Turkish minority families (Yagmur et al., 2014), or the number of sessions, for example one booster session less with the twin families (Euser et al., 2021), or separate, shorter sessions for collecting video material and providing video-feedback for parents with mild intellectual disabilities (Hodes et al., 2014). Furthermore, parents of infants in their first year of life may not yet require special attention for issues around limit setting (Juffer et al., 1997Stein et al., 2006) whereas parents of children with neurodevelopmental problems due to (risk of) autism spectrum disorder (Green et al., 2017;Poslawsky et al., 2015) may benefit from an explicit focus on stimulating joint attention and reducing stereotypical behaviors along with promotion of sensitive parenting. The attachment video-feedback intervention program (Moss et al., 2018) shows considerable overlap with VIPP-SD but its specific target of families with (high risk of) maltreatment required several adaptations (see also Juffer et al., 2017). Only slightly adapted versions of the program were implemented to enhance caregiver sensitivity in home-based and center group care (Groeneveld et al., 2011;Werner et al., 2018). The individualized approach of VIPP-SD implies that parents or caregivers provide their own baseline ("mirror images") for the intervention sessions being videotaped with their own child, similar to the strange situation procedure, which creates an individualized baseline of child interactive behaviors in the first episodes (Ainsworth et al., 1978). Video-feedback may be an active intervention component that works across socioeconomic and cultural settings, age groups, and typical and atypical groups. The potential generic impact of VIPP-SD across a variety of psychological problems may converge with evidence for the existence of a general psychopathology factor (Caspi et al., 2014;Neumann et al., 2020). One of the aims of the current meta-analytic synthesis is to compare the effectiveness across the various samples, although a conclusive test might have to wait for a potentially more powerful IPD meta-analysis.
The VIPP-SD program aims at enhancing positive parenting, with a special emphasis on sensitive responsiveness of the parent to the child's distress or support seeking signals, and on sensitive discipline or limit setting to avoid coercive cycles. The primary goal is to increase positive parental interactive behavior, because interactive behavior is the final common pathway through which the intervention may benefit the parent−child relationship and children's development . However, changes in parental cognitions about sensitivity and limit setting are important secondary goals, because cognitions such as parenting self-efficacy are theorized to be part of mechanisms of change in which performance feedback, modeling, and verbal persuasion lead to reciprocal changes in efficacy beliefs and performance of target behavior (Bandura, 1977;Schuengel & Oosterman, 2019).
In several RCTs parental psychopathology symptoms such as depression or anxiety also have been assessed and tested as outcomes of VIPP-SD but these are not part of the mechanism underlying VIPP-SD. Of course, one may have the hope that a focus on parents' competence during the intervention, and changes in parenting behavior and interactions with the child will also enhance feelings of self-efficacy and result in changes for the better in symptoms of psychopathology, but these effects would be considered side-effects. The VIPP-SD program supports parents in their role as attachment figures and disciplinarians, but the intervention was not developed to make parents happier or better functioning outside the parent-child relationship. Furthermore, the effects on child development are derived from attachment theory and coercion theory, and they pertain specifically to promoting child attachment security and decreasing child externalizing problem behavior. These child effects are supposed to result from the change in parenting behavior and hypothetically, the child effects should be mediated by enhanced parental sensitivity and sensitive limit setting. Again, in several studies a wider net has been thrown on child development, including assessments of internalizing problem behaviors or academic achievement, but these influences may also be regarded as side-effects. The current meta-analyses are therefore limited to the main goals of the VIPP-SD: promoting sensitive parenting and limit setting in parental cognitions and behavior, promoting secure child-parent attachment, and reducing externalizing child behavior. Four meta-analyses will be conducted to test four central hypotheses: (1) VIPP-SD enhances parental sensitive interactions and sensitive discipline; (2) VIPP-SD leads to parental cognitions favoring sensitive interactions and sensitive discipline; (3) VIPP-SD promotes child attachment security; and (4) VIPP-SD decreases child externalizing problem behaviors. Additionally, we examine whether studies reporting higher effect sizes for sensitive parenting also report higher effect sizes for attachment security.

Literature search
The preferred reporting items for systematic reviews and metaanalyses guidelines were used in preparing this meta-analytic synthesis. Search terms for finding pertinent RCTs using VIPP type of parenting intervention were the following: ("video-feedback" OR vipp*) AND intervention AND "randomized control* trial" for all databases in Web of Science: WOS, BCI, CCC, DRCI, DIIDW, KJD, MEDLINE, RSCI, SCIELO, ZOOREC. Date of search was May 10, 2021. PubMed was also searched with the following search terms: ((intervention*) AND randomized control* AND trial) AND (videofeedback OR VIPP*). Date of search was May 12, 2021. Three recent reviews (Bergstrom et al., 2020;Juffer et al., 2017;O'Hara et al., 2019) and the Handbook of Attachmentbased Interventions (Steele & Steele, 2018) were also searched for relevant studies. See Figure 1 for a flowchart.

Coding system
In keeping with the aims of VIPP-SD only effects on parenting (observed sensitive behavior, observed parental limit setting, and parental cognitions) and on specific dimensions of child development (attachment and externalizing behavior) were included in the meta-analysis. Thus, parental mood or psychological problems, and the child domains of cognitive development (e.g., Dubois-Comtois et al., 2017), executive functions, or internalizing problems were not taken into account.
The coding system covered the type of measure used for intervention outcomes, whether the measure was observational and the reliability for the measure, age of the child at the start of the intervention and at posttest. Type of sample (typical or atypical/ clinical), type of control condition (care as usual vs. phone calls), participating parent (mothers, fathers), socioeconomic status of the families, ethnicity, and country were coded. For quality rating of the studies the coding system included how randomization was implemented, whether fidelity checks were used, level of adherence to treatment, percentage of attrition, whether intention-to-treat analyses were performed, the number of intervention sessions, whether the study was preregistered, and whether the developers of the VIPP-SD were involved in the study. An overall quality rating was computed based on the following six indicators: intentionto-treat; fidelity checks; blind coding; attrition; preregistration; and whether the developers were part of the research team (involvement being coded as risk of bias). Average intercoder reliability of the risk of bias indicators was r = .79. Because of the potentially crucial role of involvement of developers of interventions in trials (Munder et al., 2011) we also separately tested its association with study effect sizes. As the number of studies was limited, only a few moderators could be examined in the various meta-analyses (ideally not more than one continuous moderator is tested per ten effect sizes, and in case of a categorical moderator categories should contain at least four effect sizes (Schwarzer et al., 2015). We tested the moderating role of typical versus atypical/clinical samples, type of control condition, age of the child at the start of the study, type of outcome variable (i.e., sensitive parenting vs. parental discipline), established attachment measures such as the SSP and AQS (Waters et al., 2021) versus dyadic proxies such as the child scales of the EAS (Biringen et al., 2014), and the potential influence of study quality. Extraction of effect sizes was done based on consensus between MJB-K and MHvIJ. See Table 1 for an overview of study characteristics and risks of bias.

Meta-analytic procedures
For each RCT the pertinent effect sizes closest to the raw data were extracted, preferably unadjusted estimates, and transformed to the correlation coefficient r and Fisher's Z r with its variance. The effect sizes were positive when they provided support for the hypotheses of a positive influence on parental sensitivity or sensitive discipline, a positive effect on attachment security, or a decrease of externalizing behavior. Negative effect sizes reflected effects in the direction opposite to the hypotheses. The implication of extracting all effect sizes for each study was that more than one effect size for the same construct (e.g., parenting) within the same sample was available. Multilevel meta-analysis (Assink & Wibbelink, 2016) took account of the 3-level structure of the datasets (participants, measures, samples). This three-level structure was analyzed with META, METAFOR and DMETAR in R using the random effects model (Harrer et al., 2019). The Knapp-Hartung adjustment for confidence intervals (CIs) was applied to reduce the risk of false positives, and the restricted maximum likelihood (REML) method was used to take account of the between-study heterogeneity (Harrer et al., 2021). Besides the overall pooled effect size we also computed the 95% CI to estimate the precision of the pooled effect size (Borenstein, 2019).
Publication bias and biases resulting from the (improper) use of too many researcher degrees of freedom such as p-hacking were estimated using funnel plots, Egger's tests (Egger et al., 1997), the trim-and-fill approach (Duval & Tweedie, 2000), and p-curve analysis (Simonsohn et al., 2014). The meta-analytic study builds on a previous one (Juffer et al., 2017). Since then, the number of trials has more than doubled, enabling the testing of additional moderators. The current update and extension has not been preregistered. To optimize transparency and facilitate replicability, papers included in the meta-analyses, datasets, and R codes of the meta-analyses can be found in the publicly available Supplemental materials stored at the OSF website

Quality of the studies
See Table 1 for an overview of the risks of bias in the 25 studies. Fidelity was checked in the large majority of the studies (84%) as it is part of the guidelines in the VIPP-SD protocol . Blind coding of videotaped pre-and posttest assessments was done in all RCTs. More than one third of the studiesmostly older papers − did not use intention-to-treat analyses (40%) although this is highly recommended in CONSORT and other guidelines for the statistical analysis of RCTs. Attrition might have played a role in almost half of the trials which might make generalizability problematic and without intention-to-treat may also jeopardize the internal validity of the studies. Preregistration is important from the perspective of reproducibility as it limits, although not eradicates, Researcher Degrees of Freedom during data-analysis and (selective) reporting (National Academies of Sciences, Engineering, and Medicine, 2019; Stoll et al., 2020), and 32% of the studies did use preregistration in a trial register. Again, more recent studies used preregistration more often. Note, however, that preregistrations differ in their levels of specification of, for example, the analytic approach, and some preregistrations leave room for (maybe too) many researcher degrees of freedom. Involvement of the VIPP-SD developers was noted in 48% of the studies and was counted as a risk of bias. Overall risk of bias was estimated to be low if fewer than two potential biases were rated high, whereas the risk was considered high if more than three potential biases were rated high. Two studies (8%) were considered to be at high risk of bias, nine studies (36%) were at low risk, and 14 studies (56%) raised some concerns as to risk of bias. In each of the four outcome domains, we tested whether effect size of the studies was associated with risk of bias and with developer involvement separately because of its potentially crucial role in the trial.

VIPP-SD effects on sensitive parenting and discipline
The structure of the dataset consisted of three levels: (1) participants, (2) measures within studies (e.g., sensitivity and sensitive discipline as outcome variables, or sensitivity at posttest and at follow-up assessments), and (3) variation between studies. A multilevel meta-analysis was performed because it takes this structure of potential dependence into account. No outlying effect sizes (deviating more than 3.29 SD from the mean) for parental sensitivity and discipline in k = 24 studies with 63 effect sizes (N = 1,905) were identified. The effect sizes combined within studies are presented in a forest plot, see Figure 2.
The correlation r and its 95% CI, and the weight of each study have been included. The random model using a REML method and a Knapp−Hartung approach showed a pooled effect size for parental sensitivity and discipline of r = 0.18 (95% CI 0.12, 0.23; p < .0001). Overall heterogeneity amounted to I 2 = 54.7%. The model without the within-study level showed an equivalent fit compared to the full model (LRT = 2.29, p = .13), but the level representing the variation between studies could not be omitted (LRT 12.15, p < .001).
A contour-enhanced funnel plot (Harrer et al., 2021) showed some asymmetry, and the Egger's linear regression test of funnel plot asymmetry (Egger et al., 1997) confirmed a potential publication bias (t [22] = 3.47, p = .002). Duval and Tweedie's (2000) trimand-fill compensation for potential publication bias resulted in 8 added effect sizes to reach symmetry and a corrected combined effect size of r = .11 (95% CI 0.04, 0.18) that remained significant (p = .003). The p-curve analysis (Simonsohn et al., 2014) included 8 significant study effects of which 5 effect sizes were significant at a p < .025 level. The right-skewness test was significant (zHalf = −1.87, p = .031), and the flatness test was not significant (zHalf = 2.73, p = .997). Thus, evidential value was present, and there was no lack of evidence for the absence of p-hacking or selective reporting. Overall risk of bias of the studies was not associated with effectiveness of the intervention on sensitive parenting and discipline, F(1, 22) = 1.30, p = .27.
We tested whether the effect sizes of studies on typical samples versus atypical/clinical samples were significantly different, but the multilevel meta-analysis with this moderator was not significant, F(1, 61) = 0.65, p = .42. Nor did age of the child moderate the effectiveness of the intervention, F(1, 61) = 0.66, p = .42. The contrast between sensitivity and discipline outcomes was not statistically significant either, F(1, 61) = 0.49, p = .49. No statistical differences were found between effect sizes in studies with and without involvement of developers in the trial, F(1, 61) = 2.15, p = 0.15, or with variation in treatment of the control group, F(1, 61) = 3.16, p = 0.08.

VIPP-SD effects on parenting attitudes
The pooled effect size for parental attitudes about sensitive parenting and sensitive discipline was based on 13 effect sizes in 9 studies (N = 961) and was r = 0.16 (95% CI 0.09, 0.23; p < .001), see Figure 3.
The two-level models showed a good fit to the data compared to the three-level model, with either level 2 restricted to zero (LRT = 0.272, p = .60) or level 3 restricted to zero (LRT = 0.0007, p = .98). Overall heterogeneity amounted to I 2 = 23.4%.
Testing for publication biases, we found that the funnel plot showed no asymmetry, and Egger's test showed no publication bias (t [7] = 1.48, p = .18). The p-curve analysis included four significant effects of which three effect sizes were significant at a p < .025 level. The flatness test was not significant (zHalf = 1.41, p = .92), but the right-skewness test was not significant either (zHalf = 0.72, p = .77). This implies that there was no indication for p-hacking, but according to the right-skewness test the evidential value was insufficient as there were only few studies with very small p-values. More risk of bias was associated with larger effect sizes on parental attitudes, F(1, 7) = 9.72, p = .017. In the multilevel meta-analysis, the effect sizes of studies in typical samples and atypical/clinical samples were not significantly different, F(1, 11) = 0.27, p = 0.61. Involvement of developers in the trial did not make a statistically significant difference for study effect size, F(1, 11) = 0.01, p = 0.91), nor did treatment of the control group, F(1, 11) = 3.51, p = 0.09.

VIPP-SD effects on child attachment
No outlying effect sizes for child attachment of the pertinent k = 11 studies with 16 effect sizes (N = 788) were identified. The effect sizes combined within studies are presented in a forest plot, see Figure 4.
The overall combined effect size amounted to r = .23 (95% CI 0.11, 0.34; p = .001). The two-level model showed a good fit to the data compared to the three-level model with either level 3 restricted to zero (LRT = 3.13, p = .08) or level 2 restricted to zero (LRT = 0.04, p = .84). Overall heterogeneity amounted to I 2 = 63.1%.
The contour-enhanced funnel plot (Harrer et al., 2021) showed no asymmetry or p-hacking. Egger's test (Egger et al., 1997) did not show a potential publication bias (t (9) = 0.81, p = .44). Duval and Tweedie's (2000) trim-and-fill test did not show publication bias either, and no trim-and-fill compensation for potential publication bias was indicated. The p-curve analysis (Simonsohn et al., 2014) included 5 significant study effects in the analysis of which 4 effect sizes were significant at a p < .025 level. The right-skewness test was significant (zHalf = −2.24, p = .013, whereas the flatness test was not significant (zHalf = 3.03, p > .99) Thus, the evidential value was present, and there was no indication for the absence or inadequacy of the evidential value. There was no reason to suspect p-hacking. Risk of bias of the studies was not associated with intervention effect size, F(1, 9) = 1.30, p = .27.
We tested whether the effect sizes of studies on typical samples versus studies on atypical/clinical samples were significantly different, but the multilevel meta-analysis with this moderator did not show a significant F-test: F(1, 14) = 1.47, p = .24. The contrast between studies using the standard attachment measures SSP or AQS versus other dyadic measures was not significant either, F(1, 14) = 0.84, p = .38. However, age was a significant moderator, revealing stronger effects on attachment in studies with older children, F(1, 14) = 7.48, p = .016. No statistical difference was found between study effect sizes for involvement of developers in the trial, F(1, 14) = 3.20, p = .10, or for treatment of the control group, F(1, 14) = 0.49, p = .49.
The two-level model showed a good fit to the data compared to the three-level model with level 3 restricted to zero (LRT = 1.60, p = .21) or level 2 restricted to zero (LRT = 0.00, p = .99). Overall heterogeneity amounted to I 2 = 63.83%. Because the combined effect size was not significant, we did not test for publication bias or p-hacking.
Risk of bias was not associated with size of the effects of VIPP-SD on child externalizing, F(1, 7) = 0.05, p = .83. We tested whether the effect sizes of studies on typical samples versus studies on atypical/ clinical samples were different, but the multilevel meta-analysis with this moderator did not show a significant F-test: F(1, 11) = 0.39, p = .54. Age was a significant moderator of the effect size, F(1, 11) = 9.50, p = .01. The intervention was more effective in reducing externalizing child behavior in studies with younger children. Again, no Records excluded (n = 10 protocols n= 19 reviews or metaanalyses) Full-text arƟcles assessed for eligibility (n = 24) Full-text arƟcles excluded (n = 22 no RCT ) 4 studies added from reviews (n = 28)

Studies included in quanƟtaƟve synthesis (n = 25)
Full-text arƟcles excluded (n = 3 no extractable data ) Figure 1. Flowchart of the study selection process.
statistical difference was found between study effect sizes for involvement of developers in the trial, F(1, 11) = 0.44, p = .52), nor for treatment of the control group, F(1, 11) = 0.002, p = .96. It should be noted that the moderator tests were based on small numbers of study outcomes.

Parental sensitivity and child attachment
In eleven studies both parental sensitivity and discipline and attachment were assessed. The association between the eleven effect sizes for parenting (combined within study when more than one parenting assessment was available) and the effect sizes for child attachment amounted to r = .48, p = .14). When the effect sizes for specifically parental sensitivity were selected, the association with effect sizes for attachment was r = .50 (k = 11, p = .11). One study (Kalinauskiene et al., 2009) emerged as an outlier in the scatterplot, with a strong positive effect of VIPP-SD on parenting (Fisher's Z r = 0.39) and a negligible effect on attachment security (Fisher's Z r = 0.003). In general, however, stronger effect sizes for parenting tended to be accompanied by stronger effect sizes for attachment.

Discussion
The meta-analyses of 25 randomized controlled trials testing interventions with the VIPP-SD method in more than 2,000 families showed substantial combined effect sizes on parenting behavior (r = .18) and attitudes (r = .16), and on child attachment security (r = .23), but not on child externalizing behavior problems (r = .07). The primary aim of the VIPP-SD intervention is the promotion of parental sensitivity and limit setting, and reaching this aim is a robust and replicated result. This effect did not statistically depend on (a-)typical status of the parents or children involved, which supports the potential generic impact of VIPP-SD across a variety of psychological problems. The VIPP-SD also was remarkably effective in enhancing child attachment security, which is assumed to be the consequence of higher levels of parental sensitivity and sensitive discipline. Within the small set of available studies the association between effect sizes for attachment and parenting was substantial (r = .48) but not statistically significant, which makes the hypothesis of VIPP-SD enhanced sensitivity and sensitive discipline mediating the increase in child attachment security a candidate for further study, in particular for a more powerful IPD (see, e.g., Verhage et al., 2020). It should however be noted that even a RCT may not guarantee causal mediation free from confounder biases (Hamaker et al., 2020). It is somewhat disappointing that despite the effects on parental sensitivity and sensitive discipline, we did not find a meta-analytic effect on child externalizing behavior problems. One study showed a delayed effect of VIPP-SD on externalizing problems (Van Zeijl et al., 2006), which suggests that such effects may need more time to be established. Effects on child behavior are more distal to the intervention than effects on parenting behavior. Moreover, some interventions did not include the discipline variant, in particular in studies of families with infants in their first year of life or with children struggling with neurodevelopmental risks. In one study (Van der Asdonk et al., 2020) the intervention even seemed to increase externalizing behaviors, but the authors noticed that in this sample of maltreating mothers in residential treatment child behavior problems might have been overreported by the mothers to legitimize their harsh parenting (see also Reid et al., 1987). It is promising that the largest RCT with the lowest risk of bias to date targeting families with children at risk for externalizing behavior problems showed a robust effect size for independently rated externalizing behaviors (r = .15;O'Farrelly et al., 2021).

Limitations
Before continuing with a discussion of the implications, we address some of the limitations of the current meta-analytic synthesis. The quality of the studies included in the meta-analyses varied. On an aggregated rating of six quality indicators (intent-to-treat analysis, fidelity, blindness, attrition, preregistration, and participation of developers) three studies showed high risk of bias, whereas nine studies were considered at low risk. Risk of bias was however not associated with study effect sizes. For effects on parental sensitivity and limit setting, publication bias seemed possible, with a significant Egger's test (Egger et al., 1997) and after the Duval and Tweedie's (2000) trim-and-fill compensation for potential publication bias the combined effect size shrank but remained significant. Evidential value for the meta-analyses was established with the p-curve approach (Simonsohn et al., 2014), and no evidence for p-hacking could be detected for any of the outcomes. However, the statistical power to find an effect size of r = .16 (or d =0 .32, for parenting) was sufficient in only seven out of the 25 studies. A repeated measures ANOVA with a time by group interaction in a pretest posttest control group design, with alpha = .05 and power = .80 requires a minimum sample size of N = 80 (G * Power 3.1; https://www.psychologie.hhu.de/arbeitsgruppen/ allgemeine-psychologie-und-arbeitspsychologie/gpower).
Other limitations of the meta-analyses pertain to the relatively modest number of studies involved, with accompanying limited possibilities of moderator analyses. Despite inclusion of randomized trials using only interventions of the suite of VIPP-SD parenting programs the heterogeneity of study outcomes was substantial. The focus on core outcome variables related to parenting and child attachment and externalizing behaviors in four separate metaanalyses decreased heterogeneity somewhat but the use of random effects models was still required. The inclusion of randomized controlled trials may have had a mitigating effect on the estimated pooled effect sizes, as a previous meta-analysis showed inflated effect sizes of non-RCTs (Bakermans-Kranenburg et al., 2003). This should be kept in mind when comparing the current pooled effect sizes with the results of a meta-analysis of the Circle of Security interventions (Yaholkoski et al., 2016) that included mostly quasi-experiments without proper control groups or randomization. Since meta-analyzing RCTs enhances the validity of the causal conclusions to be derived, we think there are good reasons to limit the inclusion to RCTs.
An important issue to be mentioned is that two of the developers of VIPP-SD (MJB-K and MHvIJ) had a major contribution to the current meta-analyses and they also coauthored somewhat less than half of the trials. It should be noted that the developers never received a financial compensation for the VIPP-SD program. Nevertheless, it cannot be ruled out that their contribution to the current meta-analyses might have led to (implicit) biases around the many choices meta-analysists must make. Parallel to misuse of researcher degrees of freedom (Simmons et al., 2011), we propose that meta-analyst degrees of freedom also exist. This is the reason why we have advocated the independent replication of meta-analyses (Van IJzendoorn, 1994;. Here we took two actions against misuse of meta-analyst degrees of freedom. First, independent RCTs (no developers involved) were assigned a higher quality rating, and we also separately tested the influence of involvement of  the developers in the trials on the effect sizes. Second, we documented each step of the meta-analyses on OSF in the service of transparency and reproducibility, see https://osf.io/4x2m7/. The absence of formal pre-registration of the meta-analytic synthesis might be partly compensated by the achieved transparency and the publicly available "raw data" in the published papers. We welcome others to examine whether they can reproduce our findings.

How strong are the VIPP-SD effects?
A previous meta-analysis on the effects of video-feedback on sensitivity, attachment security, parental stress, and anxiety (O'Hara et al., 2019) included 22 randomized and quasi-randomized trials with N = 1,889 families. The mixed nature of the set of included studies, with both RCTs and quasi-experiments without proper randomization, marks one of the differences with the current study. Furthermore, of the 22 studies included in the O'Hara et al. (2019) meta-analysis, ten trials were studies using VIPP-SD. The other trials were based on various other theoretical models, for example, Video Interaction Guidance (VIG, e.g., Barlow et al., 2016), which is rooted in communication theory and not in social learning or attachment theory. In this mix of studies, they found a combined effect size of d =0 .34 (95% CI: 0.20-0.49) for parental sensitivity, which is convergent with our finding of a combined effect size of r = .18 (comparable to d = 0.37). In the four included studies on attachment security, however, the authors found little evidence for an impact of the interventions. In contrast, in our meta-analysis of eleven RCTs with VIPP-SD effects on attachment security we found a combined effect size of r = .23 (comparable to d =0 .47).
Should the combined effect size for parental sensitivity and sensitive discipline be considered small, moderate, or strong? The conventional Cohen's d criteria for a small (d =0 .20), moderate (d =0 .50), and strong (d =0 .80) effect sizes are often used incorrectly, in particular when the question of translational value is addressed (Cuijpers et al., 2014;Götz et al., 2021;Kraft, 2020). Cohen (1988) argued that effect sizes should be evaluated in the specific context and domain under investigation. Based on meta-analyses in the field of attachment more valid benchmarks might be the following: small: r = .10, medium: r = .20, and large: r = .30 , but even these benchmarks may put the bar higher than necessary because they are mostly based on correlational studies instead of intervention experiments with immediate translational potential (Götz et al., 2021;Kraft, 2020). Comparison with RCTs in other, related areas might serve as a better yardstick. For example, the overall combined effect size of parent training programs in the USA reported by HOMVEE, is d =0 .10 (Sama-Miller et al., 2020). This estimate converges with the recent Michalopoulos et al. (2019) report on four popular home-visiting programs stimulating positive parenting that found a combined effect size of d =0 .11. In educational intervention research, Kraft (2020) argued that even effect sizes d lower than 0.05 should be considered small, 0.05-0.20 medium, and 0.20 or more should be called large. Thus, it is critical to weight the effect size of d =0 .37 found in the current meta-analysis for VIPP-SD effects on parenting against similar interventions in similar domains instead of using abstract yardsticks (see also Funder & Ozer, 2019). In common language an effect size of d =0 .37 would mean that providing the intervention to five families leads to one family experiencing a substantial improvement in parenting as compared to no intervention, the so-called number needed to treat (computed in dmetar with the Kraemer & Kupfer method, see Harrer et al., 2021, chapter 17.8). It is important to keep in mind that seemingly small effects of interventions that are rolled out in large populations, for example in a universal prevention program for parents visiting a well-baby clinic, would have large impact population-wide.

Future Directions
From a global perspective, large parts of the world are missing in the overview of countries with a VIPP-SD trial (see Table 1). In particular the LMICs or non-WEIRD countries (Henrich et al., 2010) are largely absent with the notable exceptions of Turkey and Colombia, but work is in progress in China, Vietnam, and Zambia. The development of an online version of training and of the intervention protocol accelerated by the COVID-19 pandemic (Virtual VIPP; Stevens et al., 2020), awaiting firm evidence of its effectiveness, might make VIPP-SD programs more readily accessible in Asian and African countries. Second, the use of VIPP-SD to support decision-making about out-of-home placement of children at high risk for maltreatment is being examined (Forslund et al., 2021). The idea is that a brief, evidence-based intervention can elicit a positive response in a maltreating parent, enhancing mutual confidence in working towards family preservation or restoration (Cyr et al., 2020;Van der Asdonk et al., 2020). Such brief interventions also make it easier to try out alternatives if a parent does not immediately respond to one such intervention. A third development is a focus on fathers. In most studies to date, mothers have participated as the primary caregivers of the children. Although fathers are increasingly considered important parental figures shaping their children's development in substantial ways , and attachment relationships with fathers and mothers jointly affect developmental outcomes (Dagan et al., 2021;Van IJzendoorn et al., 1992), fathers have not been targeted in most RCTs. The VIPP-SD protocol suggests involvement of partners of the primary caregivers in the final two booster sessions to make partners aware of the intervention process and content, and to solidify the intervention effects through the understanding and support that mothers might receive from their partner. In a pilot study focusing only on fathers the feasibility was rated high, but the pilot was too small (N = 5) for effectiveness evaluation (Lawrence et al., 2013). In the large Healthy Start, Happy Start trial the involvement of fathers seemed not to add to the effectiveness of the intervention (O'Farrelly et al., 2021). Another study showed the feasibility of prenatal video-feedback using ultrasound (VIPP-PRE) with fathers (Alyousefi-van Dijk et al., 2021). Last, it is somewhat remarkable that RCTs with VIPP-SD have not been conducted in the USA, and this may change in the future. Explanations for the absence of RCTs with VIPP-SD in the current set of studies may be that funding for long-term trials might have been difficult to get as the intervention had to compete with other social learning or attachment-based parenting interventions originating from the USA, and that in the USA group-based programs might be more acceptable.
The combined effect sizes for VIPP-SD do not account for differential susceptibility of parents and children to changes in the environment due to the intervention. In the current meta-analyses, we focused on main effects of the interventions, and we limited moderator analyses to a minimum to preserve statistical power. Moreover, in most published reports of the RCTs with VIPP-SD, moderator analyses with markers of differential susceptibility (temperamental reactivity; stress reactivity; neural connectivity; and dopamine-system related genotypes; Crone et al., 2020;Ellis et al., 2011) have not been performed (but see Bakermans-Kranenburg et al., 2008;Barone, Ozturk, & Lionetti, 2019) , making a meta-analytic synthesis impossible. Having said that, it is promising that a previous meta-analysis on a large number of RCTs in various domains of development showed impressive differences in intervention effects between more versus less susceptible participants depending on their genotype as a marker of differential susceptibility . In a single RCT the overall effect size of an intervention might be an underestimation of its effectiveness for the a priori designated sub-group of more susceptible parents or children (e.g., Cassidy et al., 2011;Klein Veldermanet al., 2006a), and this differential effect might be exacerbated in a meta-analytic synthesis of such studies. A coordinated effort across several study sites to conduct RCTs with a similar design and assessing one or more markers of differential susceptibility might solve the problem of low statistical power observed in most studies, not only for finding main effects of the VIPP-SD but also for moderator effects. Moderator effects indicating differential susceptibility require even more power than main intervention effects, but they are crucially important to answer the question "what works for whom?." An IPD (see e.g., Verhage et al., 2018Verhage et al., , 2020 meta-analysis would also enable examining the role of such susceptibility markers with more statistical power than in single RCTs.

In search of the mechanisms of change
We consider the way in which VIPP-SD implements video feedback a specific active component of the intervention. The persuasiveness of video material cannot be overestimated. Parents are given the opportunity to see their own child from a bystander perspective. They need not act in the moment but are given the time to observe their child and reflect on what they see. The intervener is present to help them observe and reflect, to pause the video on a still of a specific facial expression, or to repeat a significant fragment. Video fragments of the child's behavior, emotional expressions, and body language paired with the provision of "subtitles" as suggested and elicited by the intervener may stimulate parents to take the child's perspective. This in turn may lead to a more accurate perception of the child's needs (Juffer et al., , 2017. Figure 6 shows the specific active components of VIPP-SD, and the hypothetical mediators that these may initiate or support in the parent and that may result in more sensitive interaction and limit setting. First, repeating a video fragment provides the opportunity to recognize child signals that went unnoticed in live interaction, because the signals were subtle, or because the parent was focusing on something else. Recognizing child signals is a prerequisite for sensitivity and can be actively trained with video feedback. Second, speaking for the child stimulates the parent to take the child's perspective and gives the child's perspective an explicit role in the interaction. Third, highlighting chains of sensitive interaction (i.e., a child signal, parent's sensitive response, and the child's reaction to that response) makes the parent aware that feedback is given in the interaction itself. VIPP-SD interveners aim to make themselves superfluous: Enabling the parent to see the child's feedback is expected to start a positive feedback loop where they need no third person to comment on the interaction. This is one of the reasons why we think that even a short-term intervention may be effective in changing parent−child interactions in the long-term . Fourth, a focus on child efforts to make contact or to comply with difficult tasks or demands on the videotape, as is done in the sensitive discipline part, may stimulate parental empathy, which paves the way for an understanding, sensitive response to a child who tries hard but only partly succeeds. Fifth, watching video fragments of insensitive or inadequate parenting and discussing alternatives for such parenting behaviors is expected to help parents reframe their thoughts and beliefs and recognize their own role in the interaction with their child (Juffer & Steele, 2014). This may lay the foundation for an increased capacity to reflect on the self and the other, in the sense of Reflective Functioning (Fonagy et al., 1991). Importantly, such "corrective discussions" are embedded in an atmosphere of recognizing the parent's competence and expertise of their own child, which is established by, sixth, showing and repeating positive fragments. Thus, even an interaction-focused brief intervention like VIPP-SD assumes multiple processes involved that need to be considered hypotheses for further experimental work. Such experimental studies may help to distill and match the intervention components so as to optimize the mechanism of change for a larger proportion of families. However, increasing access to interventions, enhancing the skills of intervenors, and increasing retention of families in intervention are similarly important for further boosting the program's effectiveness.
In conclusion, the combination of attachment-based and social learning guidelines in Video-feedback Interventions to promote Positive Parenting and Sensitive Discipline is shown to be effective in enhancing parenting and attachment. The intervention is effective in a broad spectrum of families ranging from poverty-stricken to struggling with parental psychopathology or child neurodevelopmental issues. The COVID-19 pandemic accelerated the development of online training and implementation of a Virtual VIPP version and stringent effectiveness tests in RCTs of Virtual VIPP is outstanding. More work inspired by differential susceptibility theory is badly needed to examine temperamental, genetic, neural, and stress-related moderators and to find out what works best for whom. Because VIPP-SD is a personalized intervention starting with videotaped recordings of the unique interactions between the specific parent and their own child, it may have a role as a universal preventive intervention, as a diagnostic tool (Cyr et al., 2020) or a generic adjunct to a more specialized parent or child psychiatric treatment.