Convergent and Discriminant Validity Evidence of the Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM)

Daniel López-Arenas; Susana Sanduvete-Chaves; José Mena-Raposo; Salvador Chacón-Moscoso

doi:10.1017/SJP.2026.10021

Convergent and Discriminant Validity Evidence of the Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM)

Published online by Cambridge University Press: 27 March 2026

Daniel López-Arenas ,

Susana Sanduvete-Chaves

José Mena-Raposo

and

Salvador Chacón-Moscoso

Show author details

Daniel López-Arenas: Affiliation:
Universidad de Sevilla , Spain
Susana Sanduvete-Chaves*: Affiliation:
Universidad de Sevilla , Spain
José Mena-Raposo: Affiliation:
Universidad de Sevilla , Spain
Salvador Chacón-Moscoso: Affiliation:
Universidad de Sevilla , Spain Universidad Autónoma de Chile , Chile
*: Corresponding author: Susana Sanduvete-Chaves; Email: sussancha@us.es

Article contents

Abstract
Method
Results
Discussion
Conclusions
Data availability statement
Author contribution
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

The use of observational methodology has become increasingly more common in psychological research, highlighting the need for tools that ensure methodological rigor. This study presents evidence of convergent/discriminant validity for the Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM). A multitrait-multimethod (MTMM) analysis with Spearman’s correlations was used to examine the relationship between MQSOM dimensions and those of three instruments: the Methodological Rigor in Mixed Methods (MRMM), the Guidelines for Reporting Evaluations Based on Observational Methodology (GREOM), and the Mixed Methods Appraisal Tool (MMAT). Ninety-six articles were coded using MQSOM and the instruments for comparison. The MQSOM’s design converged with the MRMM’s mixed-methods design (ρ = .217, p = .034), GREOM’s design (ρ = .217, p = .034), and MMAT’s qualitative (QUAL) component (ρ = .212, p = .038). The MQSOM’s measurement and analysis aligned with MRMM’s data analysis (ρ = .611, p < .001), GREOM’s data quality control (ρ = .423, p < .001) and results (ρ = .328, p = .001), and MMAT’s quantitative (QUANT) (ρ = .214, p = .037) and mixed-methods (ρ = .643, p < .001) components. MQSOM’s design exhibited discriminant validity from MRMM’s data collection (ρ = .025, p = .807) and data analysis (ρ = −.051, p = .620), GREOM’s data quality control (ρ = .025, p = .812) and results (ρ = −.032, p = .759), and MMAT’s QUANT component (ρ = −.035, p = .733). This study reinforces the validity of MQSOM as a useful methodological quality scale.

Keywords

convergent-discriminant validity evidence Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM)mixed methods multitrait-multimethod

Information

Type: Research Article
Information: The Spanish Journal of Psychology , Volume 29 , 2026 , e5

DOI: https://doi.org/10.1017/SJP.2026.10021 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use and/or adaptation of the article.
Copyright: © The Author(s), 2026. Published by Cambridge University Press on behalf of Universidad Complutense de Madrid and Colegio Oficial de la Psicología de Madrid

The use of the observational methodology in study designs allows the spontaneous behaviors of humans in natural contexts to be recorded and quantified (Anguera et al., Reference Anguera, Blanco-Villaseñor, Losada and Sánchez-Algarra2020). Such methods are frequently employed in psychological research as they require minimal intervention and are independent from standardized measurement tools, thus enabling the collection of unbiased information (Anguera et al., Reference Anguera, Blanco-Villaseñor, Losada and Portell2018).

The observational methodology described above differs from that used in the observational studies typically found in health research. These studies are classified by quantitative (QUANT) design type, including cohort, case–control, and cross-sectional studies. In these studies, empirical groups are compared to identify cause–effect relationships when the feasibility of randomization and experimental control is limited (Cochran & Chambers, Reference Cochran and Chambers1965). In contrast, observational methodology is considered a mixed-methods approach (Anguera et al., Reference Anguera, Camerino, Castañer, Sánchez-Algarra and Onwuegbuzie2017), given that it transforms and connects both qualitative (QUAL) and QUANT elements (Cresswell & Plano-Clark, Reference Cresswell and Plano-Clark2017). Specifically, research designs based on observational methodology can be organized in a QUAL–QUANT–QUAL sequence. The initial QUAL phase involves the study design, the choice of the observation instrument, and the systematic recording of QUAL data. The subsequent QUANT phase focuses on transforming the QUAL data into a code matrix and conducting data quality control procedures prior to data analysis. Finally, the last QUAL phase involves interpreting the results within the context of the initial problem and previous literature (Anguera et al., Reference Anguera, Blanco-Villaseñor, Losada and Sánchez-Algarra2020).

Over the past decade, the increase in studies based on observational methodology has prompted numerous researchers to direct their attention to the quality concerns associated with this mixed-methods approach (Chacón-Moscoso et al., Reference Chacón-Moscoso, Sanduvete-Chaves, Portell and Anguera2013, Reference Chacón-Moscoso, Anguera, Sanduvete-Chaves, Losada and Portell2019; Portell et al., Reference Portell, Anguera, Chacón-Moscoso and Sanduvete-Chaves2015). This problem could be addressed by ensuring homogeneity and transparency in both studies that rely on observational methodology and in the criteria established for assessing their quality (Chacón-Moscoso et al., Reference Chacón-Moscoso, Sanduvete-Chaves, Portell and Anguera2013).

In order to achieve this, the Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM) was developed (Chacón-Moscoso et al., Reference Chacón-Moscoso, Anguera, Sanduvete-Chaves, Losada and Portell2019; Sanduvete-Chaves et al., Reference Sanduvete-Chaves, López-Arenas, Anguera and Chacón-Moscoso2025), drawing on the Guidelines for Reporting Evaluations Based on Observational Methodology (GREOM) (Portell et al., Reference Portell, Anguera, Chacón-Moscoso and Sanduvete-Chaves2015). The MQSOM is comprised of 11 items that assess a second-order dimension of global methodological quality through two first-order dimensions: the quality of design and the quality of measurement and analysis. It specifies the minimum methodological aspects necessary to improve the quality of studies based on observational methodology. Adequate reliability and validity evidence based on its internal structure was obtained.

The main objective of this study was to obtain empirical evidence of the convergent and discriminant validity of the MQSOM when compared with other instruments that measure methodological quality in mixed-methods studies.

Method

Participants (Units of Analysis)

The 96 papers examined for the validation of the MQSOM were obtained from among 650 studies found in an exhaustive search completed for a previous investigation (Sanduvete-Chaves et al., Reference Sanduvete-Chaves, López-Arenas, Anguera and Chacón-Moscoso2025). All met the following inclusion criteria: (a) The observational methodology was used, (b) the work was original, (c) the publication presented the usual sections of a research article (introduction, method, results, and discussion), and (d) it was written in English or Spanish. Consequently, this study employed stratified random sampling to guarantee a homogeneous distribution of the units of analysis across the range of MQSOM scores. The required sample size was calculated using G*Power for 90% power and a 5% type I error rate to detect a standardized effect size of 0.4 or larger, resulting in a sample size of 96 papers.Footnote ¹

Instruments

The MQSOM (Sanduvete-Chaves et al., Reference Sanduvete-Chaves, López-Arenas, Anguera and Chacón-Moscoso2025) was used for this study. The scale has adequate psychometric properties (root-mean-square error of approximation [RMSEA] = .000; non-normed fit index [NNFI] = 1; goodness of fit index [GFI] = .98; adjusted goodness-of-fit index [AGFI] = .97) and comprises a second-order methodological quality dimension (ω = .87; D = .55) that contains two first-order dimensions: quality of design (6 items; ω = .90; D = .46; intraclass correlation coefficient [ICC] = .933–.967) and quality of measurement and analysis (5 items; ω = .68; D = .67; ICC = .797–.988). Each item receives a score for methodological quality levels from 0 (lowest methodological quality) to 1 (highest methodological quality).

For the comparison with the MQSOM, three instruments were selected: (a) the Methodological Rigor in Mixed Methods (MRMM) checklist (Harrison et al., Reference Harrison, Reilly and Creswell2020), the GREOM (Portell et al., Reference Portell, Anguera, Chacón-Moscoso and Sanduvete-Chaves2015), and the Mixed Methods Appraisal Tool (MMAT) checklist (Hong et al., Reference Hong, Fàbregues, Bartlett, Boardman, Cargo, Dagenais, Gagnon, Griffiths, Nicolau, O’Cathain, Rousseau and Vedel2018). Table 1 presents an overview of the instruments chosen for the comparison.

Table 1.

Overview of the instruments for comparison

Note. FWCI = field-weighted citation impact.

Procedure

A total of nine instruments were considered for comparison to the MQSOM. The selected instruments are either guidelines or checklists from both the social and health sciences. The selection criteria were (a) type of studies the instruments evaluate (tools that assessed studies based on observational methodology and mixed-method studies were considered); (b) whether the instrument addresses the quality of reporting, design, measurement, and analysis; and (c) reliability and/or validity evidence reported. Table 2 presents the characteristics of the instruments considered.

Table 2.

Characteristics of the instruments considered for comparison to the MQSOM

Note. EGPQRS = Evolving Guidelines for Publication of Qualitative Research Studies in Psychology and Related Fields; GRMS = Guidelines for Reporting Momentary Studies; STROBE = Strengthening the Reporting of Observational Studies in Epidemiology; COREQ = COnsolidated criteria for REporting Qualitative Research; GRAMMS = Good Reporting of a Mixed Methods Study; GCRMR = Guidelines for Conducting and Reporting Mixed Research in the Field of Counselling and Beyond; GREOM = Guidelines for Reporting Evaluations Based on Observational Methodology; SRQR = Standards for Reporting Qualitative Research; MRMM = Methodological Rigor in Mixed Methods; OMS = suitability for assessing studies based on observational methodology; MMS = suitability for assessing mixed-methods studies; QR = suitability for assessing quality of reporting; QD = suitability for assessing quality of design; QM = suitability for assessing quality of measurement; QA = suitability for assessing quality of analysis.

Based on the established criteria, the GREOM was chosen because it was the only tool specific to studies based on observational methodology. Additionally, the MRMM and the MMAT were selected given that they were unique tools that assessed the quality of design, measurement, and analysis.

For the data extraction, Daniel López-Arenas (DLA) and José Mena-Raposo (JMR) were trained to apply the MQSOM, MRMM, GREOM, and MMAT. First, each item and its response options were explained. Salvador Chacón-Moscoso (SCM) mediated if the explanations differed. Then, both coders applied the instruments independently to two papers presenting studies based on observational methodology. Thereafter, DLA and JMR reviewed the coding together with SCM, who intervened in the case of discrepancies. Once the training was complete, DLA and JMR were assigned the same 24 randomly selected papers (25%) to code in order to evaluate the intercoder reliability. After achieving an adequate level of agreement (>.7), DLA applied the instruments to the full sample. Finally, 9 months after the initial coding, DLA re-coded 24 randomly selected papers (25%) to assess the intracoder reliability. The data extraction database is available as Supplementary Material.

Data Analysis

Using Jeffreys’s Amazing Statistics Program (JASP) 0.19.3, the ICC was calculated to evaluate and compare both inter-intracoder concordance of the instruments. Values exceeding .70 were considered adequate (Portney & Walkins, Reference Portney and Watkins2000). Additionally, descriptive statistics were calculated for each dimension. Subsequently, McDonald’s omega (ω) derived through principal factor analysis and Cronbach’s alpha (α) were both employed to assess the reliability based on the internal consistency of each instrument. Results higher than .80 were considered robust reliability evidence, and .65–.80, acceptable (Kalkbrenner, Reference Kalkbrenner2023). For the item discrimination, the average item discrimination index was computed. Results were considered excellent for values higher than .40, adequate for .20–.30, and inadequate for <.20 (Holgado-Tello, Reference Holgado-Tello, Barbero-García, Vila-Abad and Holgado-Tello2015).

Finally, a multitrait-multimethod (MTMM) analysis was conducted to provide evidence of the convergent and discriminant validity of the MQSOM (Campbell & Fiske, Reference Campbell and Fiske1959). This approach is based on the idea that a measurement scale represents the combination of a specific trait and a particular measurement method. The correlation between scores obtained from two measurement instruments may therefore arise from two sources: (1) the fact that they assess the same underlying construct and (2) the similarity of the measurement methods used. To minimize the influence of the method effect, the MTMM approach involves assessing two or more distinct traits using two or more different measurement methods. When the same trait is measured by different methods, strong correlations are expected, providing evidence of convergent validity. Conversely, if different traits show unexpectedly high correlations, this may indicate that methodological similarities are introducing bias, thereby threatening validity. In contrast, when distinct traits are compared, only weak correlations are expected (these constitute discriminant validity evidence). Therefore, Spearman’s correlation (ρ) matrix was constructed with the dimensions from the MQSOM and those from the instruments for comparison included in the analysis using Spearman’s correlation coefficient (ρ) due to the non-normal distribution of the scores. Correlation values over .4 were considered high, over .25 were considered moderate, and over .1 were considered low (Cohen, Reference Cohen1988). For convergence validity, high correlations were expected between dimensions that address similar constructs; conversely, low correlations between dimensions that address different constructs were interpreted as evidence of discriminant validity (Campbell & Fiske, Reference Campbell and Fiske1959).

Results

Inter- and Intracoder Reliability and Descriptive Statistics of the Items

Table 3 presents both inter- and intracoder reliability and descriptive statistics. ICC coefficients were adequate, ranging from .7 to 1. Regarding the descriptive statistics, the median in MQSOM was .67 for the design dimension and .7 for the measurement and analysis dimension; the means were .62 for design and .65 for the measurement and analysis; and the standard deviations were 0.28 for design and 0.18 for measurement and analysis. Regarding MRMM, the median was 1 or .5 for all the dimensions, with 0 being the median only for the dimension of mixed-methods design type. The means ranged from .43 to .90, and the standard deviations were between 0.17 and 0.46.

Table 3.

Descriptive statistics and inter–intracoder reliability of the instruments for comparison

Note. Avg. = average of intraclass correlation coefficients; Min. = minimum; Max. = maximum; M = mean; Mdn = median; SD = standard deviation; S = skewness; K = kurtosis; KS = Kolmogorov–Smirnov normality test; MQSOM design = quality of design; MQSOM M&A = quality of measurement and analysis; MRMM MM design = mixed-methods design type; MRMM writing = elements of writing; GREOM intervention = intervention and expected outcomes.

^a ICC is computed for a single item, and there is no minimum or maximum.

ICC and KS obtained p < .05.

Regarding the GREOM, the median was .5 or higher for 66.7% of the dimensions, with a 0 only for design. The means were between .43 and .68, and the standard deviations ranged between 0.21 and 0.46. Regarding the MMAT, the median was .6 for the QUAL and mixed-methods component and .7 for the QUANT component. The means were between .56 and .65, and the standard deviations ranged between .18 and .23. Normal distribution was not found for any of the items of the different instruments.

Reliability Based on Internal Consistency and Discrimination

Table 4 presents the results obtained for the reliability coefficients based on the internal consistency and discrimination indexes of each dimension of the instruments under consideration.

Table 4.

Reliability and discrimination of the dimensions

Note. CI = confidence interval; LL = lower limit; UL = upper limit; Avg. = average of intraclass correlation coefficients; Min. = minimum; Max. = maximum; MQSOM Design = quality of design; MQSOM M&A = quality of measurement and analysis; MRMM MM design = mixed-methods design type; MRMM writing = elements of writing; GREOM intervention = intervention and expected outcomes.

^a Reliability coefficients are not computed since the dimension is comprised of a single item.

^b The discrimination indexes displayed are not an average but unitary since they are formed by a single item.

The reliability based on the internal consistency of the instruments was adequate only for the MQSOM’s design and the MMAT QUAL and mixed-methods components. Discrimination was considered excellent for all dimensions except MQSOM’s measurement and analysis, and the MRMM’s aims and purpose and data integration, which were considered adequate, and the MRMM’s data collection and data analysis, which were deemed poor.

MTMM Matrix

Table 5 presents the MTMM matrix with Spearman’s correlation coefficients of the MQSOM dimensions with the dimensions of MRMM, GREOM, and MMAT.

Table 5.

Multitrait-multimethod matrix

Note. MQSOM design = quality of design; MRMM5 MM design type = mixed-methods design type.

*p < .05; **p < .01.

The results showed significant correlations of MQSOM with 63.33% of the instrument dimensions (19 correlations), with magnitudes ranging from .21 to .64. Additionally, a significant moderate factor correlation was identified within MQSOM (ρ = .364, p < .001).

Regarding the MRMM checklist, the MQSOM dimension of design exhibited moderate correlations with the dimensions of aims and purpose (ρ = .23, p = .027) and mixed-methods design type (ρ = .22, p = .034) and a strong correlation with elements of writing (ρ = .47, p < .001). Conversely, it showed low and nonsignificant correlations with the dimensions of data collection (ρ = .03, p = .807) and data analysis (ρ = −.05, p = .620). Regarding the GREOM, the MQSOM design dimension exhibited a moderate correlation with the design dimension (ρ = .22, p = .034) and a moderate-to-strong correlation with the intervention and expected outcome dimension (ρ = .34, p = .001). Conversely, it exhibited low and nonsignificant correlations with samples (ρ = −.03, p = .750), data quality control (ρ = .03, p = .812), and results (ρ = −.03, p = .759). Regarding the MMAT checklist, the MQSOM dimension design exhibited a moderate correlation with the QUAL component (ρ = .21, p = .038). Conversely, it exhibited a low and nonsignificant correlation with the QUANT component (ρ = −.04, p = .733).

Regarding the MRMM checklist, the MQSOM dimension measurement and analysis exhibited a moderate correlation with the aims and purpose dimension (ρ = .21, p = .041) and strong correlations with the dimensions data integration (ρ = .48, p < .001), mixed-methods design type (ρ = .45, p < .001), elements of writing (ρ = .42, p < .001), and data analysis (ρ = .61, p < .001). Conversely, it showed low-to-moderate correlation and nonsignificant relationships with the dimensions data collection (ρ = .17, p = .105). Regarding the GREOM, the measurement and analysis dimension exhibited strong correlations with the dimensions design (ρ = .45, p < .001), data quality control (ρ = .42, p < .001), results (ρ = .33, p = .001), intervention and expected outcomes (ρ = .55, p < .001), and instruments (ρ = .61, p < .001). Conversely, it showed a low and nonsignificant correlation with the dimension samples (ρ = .11, p = .276). Regarding the MMAT checklist, the measurement and analysis dimension exhibited a moderate correlation with the QUANT component (ρ = .21, p = .037) and strong correlations with both the QUAL (ρ = .64, p < .001) and mixed-methods (ρ = .64, p < .001) components.

Discussion

The findings provide evidence of adequate inter- and intracoder reliability for the MQSOM and the instruments for comparison. In contrast, both McDonald’s omega and Cronbach’s alpha values indicate a lack of internal consistency for the MQSOM’s measurement and analysis dimension, all the GREOM dimensions, and the MMAT’s QUANT component. Regarding discrimination, the global average discrimination index was deemed acceptable for the MQSOM and all the dimensions of the instruments for comparison, except for the MRMM’s data collection and data analysis. This suggests that these dimensions (which comprise a single item each) may be ineffective in distinguishing between high- and low-quality studies.

The MTMM analysis provides substantial evidence of the convergent and discriminant validity of MQSOM. Specifically, the MQSOM dimensions demonstrated significant correlations with dimensions addressing similar constructs through all instruments for comparison, ranging from moderate to strong. Conversely, the design quality dimension demonstrated low, nonsignificant correlations with the dimensions that assess different constructs (Campbell & Fiske, Reference Campbell and Fiske1959).

Specifically, the MQSOM’s design demonstrates convergent validity evidence with the same constructs from the instrument for comparison, e.g., the MRMM’s and GREOM’s designs. Additionally, it demonstrates convergent validity evidence with several analogous constructs. These include MRMM’s aims and purpose dimension, which involves justifying the use of mixed methods, and its elements of writing dimension, which encompass identifying the study as mixed methods. The MQSOM’s design also shows convergence with GREOM’s intervention and expected outcomes, which requires justification for employing an observational methodology, and with MMAT’s QUAL component. This latter convergence is consistent with the latent construct of design quality addressed by MQSOM, since the initial QUAL phase begins with a justification for the use of observational methodology (Anguera et al., Reference Anguera, Blanco-Villaseñor, Losada and Sánchez-Algarra2020).

The MQSOM’s measurement and analysis demonstrate convergent validity evidence with the same constructs from the instrument for comparison, such as MRMM’s data analysis and GREOM’s instruments, data quality control, and results. Additionally, convergent validity evidence was exhibited with similar constructs, such as MRMM’s data integration and MMAT’s QUAL, QUANT, and mixed-methods components. This finding is congruent with the latent construct quality of measurement and analysis assessed by the MQSOM. In the context of observational methodology, measurement entails the coding of spontaneous behavior to yield a matrix of QUAL data; the integration of data strands is achieved when these QUAL data are transformed into QUANT parameters for analysis, positioning observational methodology as a mixed-methods approach (Anguera et al., Reference Anguera, Blanco-Villaseñor, Losada and Sánchez-Algarra2020).

Regarding discriminant validity evidence, the MQSOM’s design obtained low and nonsignificant correlations with MRMM’s data collection, data integration, and data analysis, GREOM’s samples, instruments, data quality control, and results, and the MMAT’s QUANT and mixed-methods components. Additionally, the MQSOM’s measurement and analysis obtained discriminant evidence with GREOM’s samples.

On the other hand, MQSOM’s measurement and analysis yielded some unexpected results. First, significant correlations were exhibited with MRMM’s both aims and purpose (although low–moderate) and elements of writing (strong). This can be attributed to the pivotal role of the discussion section in the observational methodology process, given that the final step is the QUAL interpretation of QUANT results. The discussion section is also critical in the coding criteria of the MRMM items, which assess the value of mixed-method research for aims and purpose, and in the incorporation of references to mixed-method studies for elements of writing.

Additionally, the MQSOM’s measurement and analysis dimension exhibited significant, strong correlations with the design dimensions of both MRMM and GREOM. This can be attributed to the inherent overlap between research design and research implementation, whereby the quality of the design can influence the quality of subsequent measurement and analysis. Finally, there was a nonsignificant correlation between the MQSOM’s measurement and analysis and the analogous construct, the MRMM’s data collection. This could be attributed to the fact that the MQSOM is centered on methodological quality (e.g., the adequacy of the instrument), while the MRMM focuses on the quality of the reporting (e.g., if the instrument used is specified).

Study Limitations and Further Development

Although this study has contributed to the validity evidence of the MQSOM, several limitations should be considered. First, the internal consistency of MQSOM’s measurement and analysis was below optimal levels. This finding, unexpected based on the results obtained in the original validation of the scale (Sanduvete-Chaves et al., Reference Sanduvete-Chaves, López-Arenas, Anguera and Chacón-Moscoso2025) and based on other applications with a larger sample, indicates a potential need for further research with a larger and more diverse sample of studies based on observational methodology to ensure the reliable measurement of the latent construct. In the present study, we decided to maintain the validity of the dimension in its original version instead of introducing modifications to obtain a higher reliability coefficient, given that the dimension is considered well represented; e.g., each item refers to a different aspect: observation instrument, software, type of parameter, reliability, and type of data analysis.

Additionally, the instruments available for comparison in the mixed methods, such as guidelines and checklists, do not always provide comprehensive psychometric evidence, particularly regarding reliability and construct validity. Furthermore, they focus mainly on the quality of the reporting, while the MQSOM is mainly centered on measuring methodological quality. Consequently, while the evidence supports convergent and discriminant validity evidence, these findings should be interpreted within the context of the differences in the constructs that the instruments compared are measuring. Further research is necessary to compare the MQSOM with other scales that may be developed in the coming years to assess the methodological quality in mixed-methods studies.

Furthermore, the preliminary evidence of convergent and discriminant validity based on correlation analyses should be complemented with additional statistical techniques—such as confirmatory factor analysis, structural equation modeling, or network analysis—to provide a deeper understanding of the latent dimensions in which the MQSOM and the comparison instrument appear (or fail) to converge (Marsh & Grayson, Reference Marsh, Grayson and Hoyle1995). Future studies should also examine the factor invariance of the MQSOM across diverse areas of applied research grounded in observational methodology.

From a practical standpoint, the MQSOM scale can be applied across a wide range of research contexts, such as studies in sports—including soccer (López-Arenas et al., in press), tennis, and basketball. These applications will contribute to synthesizing the accumulated empirical evidence from various lines of research based on observational methodology. Consequently, distinct methodological quality profiles will be available for each research area, facilitating the comparison and extrapolation of high-quality observational research across disciplines.

Conclusions

In summary, this study contributes to an understanding of the psychometric properties of the MQSOM, providing convergent and discriminant validity evidence. The findings indicate high and significant correlations with equal or similar constructs measured with the MQSOM and the instruments for comparison and low and nonsignificant correlations in relation to different constructs. This reinforces the validity evidence of the MQSOM as a useful methodological quality scale.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/SJP.2026.10021.

Data availability statement

The datasets presented in this study can be found as supplementary material and in the online repository Open Science Framework (https://osf.io/3zbqy/).

Acknowledgements

We would like to thank Wendy Gosselin (American Translation Association member #275293) for reviewing the English version of this text for publication.

Author contribution

S.C.M. and S.S.C conceptualized the study. D.L.A. and J.M.R. investigated the data. D.L.A. and J.M.R. curated the data. D.L.A. is involved in formal analysis. D.L.A. wrote the original draft. S.C.M. and S.S.C. supervised the data. S.S.C. and S.C.M. wrote, reviewed, and edited the manuscript.

Funding statement

This work was supported by ANID, Government of Chile, the National Scientific and Technological Development Fund (FONDECYT) project 1250316 (S.C.M. and S.S.C.); the Julio Olea Grant for Young Researchers of the Spanish Association of Behavioral Science Methodology (AEMCCO; D.L.A.); and the Spanish Ministry of Science, Innovation and Universities (MICIU/AEI/10.13039/501100011033), research project PID2020-115486GB-I00 (all authors). Funding for open access charge: Universidad de Sevilla/CBUA.

Competing interests

None declared.

Footnotes

¹ The list of articles included is available at https://osf.io/3zbqy/files/osfstorage.

References

Anguera, M. T., Blanco-Villaseñor, A., Losada, J. L., & Portell, M. (2018). Guidelines for designing and conducting a study that applies observational methodology. Anuario de Psicología, 48(1), 9–17. https://doi.org/10.1016/j.anpsic.2018.02.001CrossRef Google Scholar

Anguera, M. T., Blanco-Villaseñor, A., Losada, J. L., & Sánchez-Algarra, P. (2020). Integración de elementos cualitativos y cuantitativos en metodología observacional [Integration of qualitative and quantitative elements in observational methodology]. Ámbitos: Revista Internacional de Comunicación, 49, 49–70. https://doi.org/10.12795/Ambitos.2020.i49.04Google Scholar

Anguera, M. T., Camerino, O., Castañer, M., Sánchez-Algarra, P., & Onwuegbuzie, A. (2017). The specificity of observational studies in physical activity and sports sciences: Moving forward in mixed methods research and proposals for achieving quantitative and qualitative symmetry. Frontiers in Psychology, 9(2196). https://doi.org/10.3389/fpsyg.2017.02196Google Scholar

Booth, A., Hannes, K., Harden, A., Noyes, J., Harris, J., & Tong, A. (2014). Chapter 21. COREQ (consolidated criteria for reporting qualitative studies). In Moher, D., Altman, D. G., Schulz, K. F., Simera, I., & Wager, E. (Eds.), Guidelines for reporting health research: A user’s manual (pp. 214–226). John Wiley & Sons.10.1002/9781118715598.ch21CrossRef Google Scholar

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https://doi.org/10.1037/h0046016CrossRef Google Scholar PubMed

Chacón-Moscoso, S., Anguera, M. T., Sanduvete-Chaves, S., Losada, J. L., & Portell, M. (2019). Methodological quality checklist for studies based on observational methodology (MQCOM). Psicothema, 31(4), 458–464. https://doi.org/10.7334/psicothema2019.116CrossRef Google Scholar PubMed

Chacón-Moscoso, S., Sanduvete-Chaves, S., Portell, M., & Anguera, M. T. (2013). Reporting a program evaluation: Needs, program plan, intervention, and decisions. International Journal of Clinical and Health Psychology, 13(1), 58–66. https://doi.org/10.1016/S1697-2600(13)70008-5CrossRef Google Scholar

Cochran, W. G., & Chambers, S. P. (1965). The planning of observational studies of human populations. Journal of the Royal Statistical Society. Series A (General), 128(2), 234–266. https://doi.org/10.2307/2344179CrossRef Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. L. Erlbaum Associates.Google Scholar

Cresswell, J. W., & Plano-Clark, V. (2017). Designing and conducting mixed methods research. Sage.Google Scholar

Elliott, R., Fischer, C. T., & Rennie, D. L. (1999). Evolving guidelines for publication of qualitative research studies in psychology and related fields. British Journal of Clinical Psychology, 38, 215–229. https://doi.org/10.1348/014466599162782CrossRef Google Scholar PubMed

Harrison, R., Reilly, T., & Creswell, J. (2020). Methodological rigor in mixed methods: An application in management studies. Journal of Mixed Methods Research, 14(4), 473–495. https://doi.org/10.1177/1558689819900585CrossRef Google Scholar

Holgado-Tello, F. P. (2015). Análisis de los ítems. [analysis of the items]. In Barbero-García, M. I., Vila-Abad, E., & Holgado-Tello, F. P. (Eds.), Psicometría [Psychometrics] (pp. 407–468). Sanz y Torres.Google Scholar

Hong, Q. N., Fàbregues, S., Bartlett, G., Boardman, F., Cargo, M., Dagenais, P., Gagnon, M. P., Griffiths, F., Nicolau, B., O’Cathain, A., Rousseau, M. C., & Vedel, I. (2018). The Mixed Methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. Education for Information, 34(4), 285–291. https://doi.org/10.3233/EFI-180221CrossRef Google Scholar

Hong, Q. N., Pluye, P., Fàbregues, S., Bartlett, G., Boardman, F., Cargo, M., Dagenais, P., Gagnon, M. P., Griffiths, F., Nicolau, B., O’Cathain, A., Rousseau, M. C., & Vedel, I. (2019). Improving the content validity of the mixed methods appraisal tool: A modified e-Delphi study. Journal of Clinical Epidemiology, 111, 49–59. https://doi.org/10.1016/j.jclinepi.2019.03.008CrossRef Google Scholar

Kalkbrenner, M. T. (2023). Alpha, omega, and H internal consistency reliability estimates: Reviewing these options and when to use them. Counseling Outcome Research and Evaluation, 14(1), 77–88. https://doi.org/10.1080/21501378.2021.1940118CrossRef Google Scholar

Leech, N., & Onwuegbuzie, A. (2010). Guidelines for conducting and reporting mixed research in the field of counseling and beyond. Journal of Counseling & Development, 88(1), 61–69. https://doi.org/10.1002/j.1556-6678.2010.tb00151.xCrossRef Google Scholar

Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait-multimethod data. In Hoyle, R. H. (Ed.), Structural equation modeling: Concepts, issues and application (pp. 177–198). Sage.Google Scholar

O’Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine, 89, 1245–1251. https://doi.org/10.1097/ACM.0000000000000388CrossRef Google Scholar PubMed

O’Cathain, A., Murphy, E., & Nicholl, J. (2008). The quality of mixed methods studies in health services research. Journal of Health Services Research & Policy, 13(2), 92–98. https://doi.org/10.1258/jhsrp.2007.007074CrossRef Google Scholar PubMed

Portell, M., Anguera, M. T., Chacón-Moscoso, S., & Sanduvete-Chaves, S. (2015). Guidelines for reporting evaluations based on observational methodology. Psicothema, 27(3), 283–289. https://doi.org/10.7334/psicothema2014.276CrossRef Google Scholar PubMed

Portney, L., & Watkins, M. (2000). Foundations of clinical research: Applications to practice. Prentice Hall.Google Scholar

Sanduvete-Chaves, S., López-Arenas, D., Anguera, M. T., & Chacón-Moscoso, S. (2025). A scale for evaluating the methodological quality of studies based on observational methodology. Psicothema, 37(1), 1–10. https://doi.org/10.70478/psicothema.2025.37.01CrossRef Google Scholar

Stone, A. A., & Shiffman, S. (2002). Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine, 147(8), 573–578. https://doi.org/10.1207/S15324796ABM2403_09Google Scholar

von Elm, E., Altman, D. G., Effer, M., Pocock, S. J., Gotzsche, P. C., & Vandenbroucke, J. P. (2007). The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Annals of Internal Medicine, 147(8), 573–577. https://doi.org/10.7326/0003-4819-147-8-200710160-00010CrossRef Google Scholar PubMed

Table 1. Overview of the instruments for comparison

Table 2. Characteristics of the instruments considered for comparison to the MQSOM

Table 3. Descriptive statistics and inter–intracoder reliability of the instruments for comparison

Table 4. Reliability and discrimination of the dimensions

Table 5. Multitrait-multimethod matrix

López-Arenas et al. supplementary material

DOI: https://doi.org/10.1017/SJP.2026.10021.sm001

File 39.4 KB

Article contents

Convergent and Discriminant Validity Evidence of the Methodological Quality Scale for Studies Based on Observational Methodology (MQSOM)

Abstract

Keywords

Information

Method

Participants (Units of Analysis)

Instruments

Procedure

Data Analysis

Results

Inter- and Intracoder Reliability and Descriptive Statistics of the Items

Reliability Based on Internal Consistency and Discrimination

MTMM Matrix

Discussion

Study Limitations and Further Development

Conclusions

Supplementary material

Data availability statement

Acknowledgements

Author contribution

Funding statement

Competing interests

Footnotes

References

López-Arenas et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests