References
Achenbach System of Empirically Based Assessment (ASEBA). (n.d.). The ASEBA approach. https://aseba.org/. Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension. American Psychologist, 63, 32–50. https://doi.org/10.1037/0003-066X.63.1.32. American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association. www.testingstandards.net/open-access-files.html. Babcock, B., & Hodge, K. J. (2020). Rasch versus classical equating in the context of small sample sizes. Educational and Psychological Measurement, 80, 499–521. https://doi.org/10.1177/0013164419878483. Bansal, P. S., Babinski, D. E., Waxmonsky, J. G., & Waschbusch, D. A. (2022). Psychometric properties of parent ratings on the Inventory of Callous–Unemotional Traits in a nationally representative sample of 5- to 12-year-olds. Assessment, 29, 242–256. https://doi.org/10.1177/1073191120964562. Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22, 507–526. https://doi.org/10.1037/met0000077. Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25, 673–690. https://doi.org/10.1037/met0000253. Boulkedid, R., Abdoul, H., Loustau, M., Sibony, O., & Alberti, C. (2011). Using and reporting the Delphi method for selecting healthcare quality indicators: A systematic review. PloS One, 6(6), e20476. https://doi.org/10.1371/journal.pone.0020476. Bratt, C., Abrams, D., Swift, H. J., Vauclair, C. M., & Marques, S. (2018). Perceived age discrimination across age in Europe: From an ageing society to a society for all ages. Developmental Psychology, 54, 167–180. https://doi.org/10.1037/dev0000398. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multi-model inference. Springer-Verlag.
Byrne, B. M., Oakland, T., Leong, F. T. L. et al. (2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3, 94–105. https://doi.org/10.1037/a0014516. Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466. https://doi.org/10.1037/0033-2909.105.3.456. Camilli, G. (2006). Test fairness. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 221–256). Praeger.
Chilisa, B. (2020). Indigenous research methodologies. Sage.
Cleveland, H. H., Wiebe, R. P., van den Oord, E. J. C. G., & Rowe, D. C. (2000). Behavior problems among children from different family structures: The influence of genetic self-selection. Child Development, 71, 733–751. https://doi.org/10.1111/1467-8624.00182. Covarrubias, A., & Vélez, V. (2013). Critical race quantitative intersectionality: An antiracist research paradigm that refuses to “let the numbers speak for themselves.” In Lynn, M. & Dixson, A. D. (Eds.), Handbook of critical race theory in education (pp. 270–285). Routledge.
Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989 (1), Article 8, 139–167.
Crowder, M. K., Gordon, R. A., Brown, R. D., Davidson, L. A., & Domitrovich, C. E. (2019). Linking social and emotional learning standards to the WCSD Social-Emotional Competency Assessment: A Rasch approach. School Psychology Quarterly, 34, 281–295. https://doi.org/10.1037/spq0000308. Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross-national research. Annual Review of Sociology, 40, 55–75. https://doi.org/10.1146/annurev-soc-071913-043137. De Bondt, N., & Van Petegem, P. (2015). Psychometric evaluation of the overexcitability questionnaire-two applying Bayesian structural equation modeling (BSEM) and multiple-group BSEM-based alignment with approximate measurement invariance. Frontiers in Psychology, 6, 1963. https://doi.org/10.3389/fpsyg.2015.01963. Duckworth, A. L., & Yeager, D. S. (2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher, 44, 237–251. https://doi.org/10.3102/0013189X15584327. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
Evers, A., Muñiz, J., Hagemeister, C. et al. (2013). Assessing the quality of tests: Revision of the European Federation of Psychologists’ Associations (EFPA) review model. Psichothema, 25, 283–291.
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8, 370–378. https://doi.org/10.1177/1948550617693063. Golinski, C., & Cribbie, R. A. (2009). The expanding role of quantitative methodologists in advancing psychology. Canadian Psychology, 50, 83–90. https://doi.org/10.1037/a0015180. Gordon, R. A. (2015). Measuring constructs in family science: How can IRT improve precision and validity? Journal of Marriage and Family, 77, 147–176. https://doi.org/10.1111/jomf.12157. Gordon, R. A., Crowder, M. K., Aloe, A. M., Davidson, L. A., & Domitrovich, C. E. (2022). Student self-ratings of social-emotional competencies: Dimensional structure and outcome associations of the WCSD-SECA among Hispanic and non-Hispanic White boys and girls in elementary through high school. Journal of School Psychology, 93, 41–62. https://doi.org/10.1016/j.jsp.2022.05.002. Gordon, R. A., & Davidson, L. A. (2022). Cross-cutting issues for measuring SECs in context: General opportunities and challenges with an illustration of the Washoe County School District Social-Emotional Competency Assessment (WCSD-SECAs). In Jones, S., Lesaux, N., & Barnes, S. (Eds.), Measuring non-cognitive skills in school settings (pp. 225–251). Guilford Press.
Guttmannova, K., Szanyi, J. M., & Cali, P. W. (2008). Internalizing and externalizing behavior problem scores: Cross-ethnic and longitudinal measurement invariance of the Behavior Problem Index. Educational and Psychological Measurement, 68, 676–694. https://doi.org/10.1177/0013164407310127. Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31, 1481–1496. https://doi.org/10.1037/pas0000731. Hauser, R. M., & Goldberger, A. S. (1971). The treatment of unobservable variables in path analysis. Sociological Methodology, 3, 81–117. https://doi.org/10.2307/270819. Hussey, I., & Hughes, S. (2020). Hidden invalidity among 15 commonly used measures in social and personality psychology. Advances in Methods and Practices in Psychological Science, 3, 166–184. https://doi.org/10.1177/2515245919882903. Johnson, J. L., & Geisinger, K. F. (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards. American Educational Research Association. https://doi.org/10.3102/9780935302967_1. Kim, E. S., Cao, C., Wang, Y., & Nguyen, D. T. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling, 24, 524–544. https://doi.org/10.1080/10705511.2017.1304822. King, K. M., Pullman, M. D., Lyon, A. R., Dorsey, S., & Lewis, C. C. (2019). Using implementation science to close the gap between the optimal and typical practice of quantitative methods in clinical science. Journal of Abnormal Psychology, 128, 547–562. https://doi.org/10.1037/abn0000417. Lai, M. H. C., Liu, Y., & Tse, W. W. (2022). Adjusting for partial invariance in latent parameter estimation: Comparing forward specification search and approximate invariance methods. Behavior Research Methods, 54, 414–434. https://doi.org/10.3758/s13428-021-01560-2. Lane, S., Raymond, M. R., & Haladyna, T. M. (2016). Handbook of test development. Routledge.
Lansford, J. E., Rothenberg, W. A., Riley, J. et al. (2021). Longitudinal trajectories of four domains of parenting in relation to adolescent age and puberty in nine countries. Child Development, 92, e493–e512. https://doi.org/10.1111/cdev.13526. Lee, J., & Wong, K. K. (2022). Centering whole-child development in global education reform international perspectives on agendas for educational equity and quality. Routledge. https://doi.org/10.4324/9781003202714. Lemann, N. (2000). The big test: The secret history of American meritocracy. Farrar, Straus, and Giroux.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.
Liu, Y., Millsap, R. E., West, S. G. et al. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22, 486–506. https://doi.org/10.1037/met0000075. Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage.
Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). Stata Press.
Luong, R., & Flake, J. K. (2022). Measurement invariance testing using confirmatory factor analysis and alignment optimization: A tutorial for transparent analysis planning and reporting. Psychological Methods. https://doi.org/10.1037/met0000441. Marsh, H. W., Guo, J., Parker, P. D. et al. (2018). What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods, 23, 524–545. https://doi.org/10.1037/met0000113. McLeod, J. D., Kruttschnitt, C., & Dornfeld, M. (1994). Does parenting explain the effects of structural conditions on children’s antisocial behavior? A comparison of Blacks and Whites. Social Forces, 73, 575–604. https://doi.org/10.2307/2579822. McLoyd, V., & Smith, J. (2002). Physical discipline and behavior problems in African American, European American, and Hispanic children: Emotional support as a moderator. Journal of Marriage and Family, 64, 40–53. https://doi.org/10.1111/j.1741-3737.2002.00040.x. Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. The Journal of Applied Psychology, 95(4), 728–743. https://doi.org/10.1037/a0018966. Meade, A. W., & Bauer, D. J. (2007). Power and precision in confirmatory factor analytic tests of measurement invariance. Structural Equation Modeling, 14, 611–635. https://doi.org/10.1080/10705510701575461. Meitinger, K., Davidov, E., Schmidt, P., & Braun, M. (2020). Measurement invariance: Testing for it and explaining why it is absent. Survey Research Methods, 14, 345–349.
Morrell, L., Collier, T., Black, P., & Wilson, M. (2017). A construct-modeling approach to develop a learning progression of how students understand the structure of matter. Journal of Research in Science Teaching, 54, 1024–1048. https://doi.org/10.1002/tea.21397. Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313–335. https://doi.org/10.1037/a0026802. Muthén, B., & Asparouhov, T. (2013). BSEM measurement invariance analysis. Mplus Web Notes: No. 17. www.statmodel.com. Muthén, B., & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociological Methods & Research, 47(4), 637–664. https://doi.org/10.1177/0049124117701488. Oakland, T., Douglas, S., & Kane, H. (2016). Top ten standardized tests used internationally with children and youth by school psychologists in 64 countries: A 24-year follow-up study. Journal of Psychoeducational Assessment, 34, 166–176. https://doi.org/10.1177/0734282915595303. Parcel, T. L., & Menaghan, E. G. (1988). Measuring behavioral problems in a large cross-sectional survey: Reliability and validity for children of the NLS youth. Unpublished manuscript. Columbus, OH: Center for Human Resource Research, Ohio State University.
Pokropek, A., & Pokropek, E. (2022). Deep neural networks for detecting statistical model misspecifications: The case of measurement invariance. Structural Equation Modeling, 29, 394–411. https://doi.org/10.1080/10705511.2021.2010083. Pokropek, A., Schmidt, P., & Davidov, E. (2020). Choosing priors in Bayesian measurement invariance modeling: A Monte Carlo simulation study. Structural Equation Modeling, 27, 750–764. https://doi.org/10.1080/10705511.2019.1703708. Raju, N., Fortmann-Johnson, K. A., Kim, W. et al. (2009). The item parameter replication method for detecting differential item functioning in the polytomous DFIT framework. Applied Psychological Measurement, 33, 133–147. https://doi.org/10.1177/0146621608319514. Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197–207. https://doi.org/10.1177/014662169001400208. Rescorla, L. A., Adams, A., Ivanova, M. Y., & International ASEBA Consortium. (2020). The CBCL/1½–5’s DSM‑ASD scale: Confirmatory factor analyses across 24 societies. Journal of Autism and Developmental Disorders, 50, 3326–3340. https://doi.org/10.1007/s10803-019-04189-5. Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354–373. https://doi.org/10.1037/a0029315. Rimfeld, K., Malanchini, M., Hannigan, L. J. et al. (2019). Teacher assessments during compulsory education are as reliable, stable and heritable as standardized test scores. Journal of Child Psychology and Psychiatry, 60, 1278–1288. https://doi.org/10.1111/jcpp.13070. Rotberg, I. C. (Ed.). (2010). Balancing change and tradition in global education reform (2nd ed.). Rowman & Littlefield.
Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selection: What does current research support? Human Resource Management Review, 16, 155–180. https://doi.org/10.1016/j.hrmr.2006.03.004. Royston, P., Altman, D. G., & Sauerbrei, W. (2005). Dichotomizing continuous predictors in multiple regression: A bad idea. Statistics in Medicine, 25, 127–141. https://doi.org/10.1002/sim.2331. Sablan, J. R. (2019). Can you really measure that? Combining critical race theory and quantitative methods. American Educational Research Journal, 56, 178–203. https://doi.org/10.3102/0002831218798325. Samejima, F. (2010). The general graded response model. In Nering, M. L. & Ostini, R. (Eds.), Handbook of polytomous item response theory models (pp. 77–107). Routledge.
Seddig, D., & Lomazzi, V. (2019). Using cultural and structural indicators to explain measurement noninvariance in gender role attitudes with multilevel structural equation modeling. Social Science Research, 84, 102328. https://doi.org/10.1016/j.ssresearch.2019.102328. Sestir, M. A., Kennedy, L. A., Peszka, J. J., & Bartley, J. G. (2021). New statistics, old schools: An overview of current introductory undergraduate and graduate statistics pedagogy practices. Teaching of Psychology. https://doi.org/10.1177/00986283211030616. Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18, 572–582. https://doi.org/10.1037/a0034177. Shute, R. H., & Slee, P. T. (2015). Child development theories and critical perspectives. Routledge.
Sirganci, G., Uyumaz, G., & Yandi, A. (2020). Measurement invariance testing with alignment method: Many groups comparison. International Journal of Assessment Tools in Education, 7, 657–673. https://doi.org/10.21449/ijate.714218. Spencer, M. S., Fitch, D., Grogan-Taylor, A., & Mcbeath, B. (2005). The equivalence of the behavior problem index across U.S. ethnic groups. Journal of Cross-Cultural Psychology, 36(5), 573–589. https://doi.org/10.1177/0022022105278 Sprague, J. (2016). Feminist methodologies for critical researchers: Bridging differences. Rowman & Littlefield.
Strobl, C., Kopf, J., Kohler, L., von Oertzen, T., & Zeileis, A. (2021). Anchor point selection: Scale alignment based on an inequality criterion. Applied Psychological Measurement, 45, 214–230. https://doi.org/10.1177/0146621621990743. Studts, C. R., Polaha, J., & van Zyl, M. A. (2017). Identifying unbiased items for screening preschoolers for disruptive behavior problems. Journal of Pediatric Psychology, 42, 476–486. https://doi.org/10.1093/jpepsy/jsw090. Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-group invariance with categorical outcomes using updated guidelines: An illustration using Mplus and the lavaan/semtools packages. Structural Equation Modeling, 27, 111–130. https://doi.org/10.1080/10705511.2019.1602776. West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In Hoyle, R. H. (Ed.), Handbook of structural equation modeling (pp. 209–231). Guilford Press.
Winter, S. D., & Depaoli, S. (2020). An illustration of Bayesian approximate measurement invariance with longitudinal data and a small sample size. International Journal of Behavioral Development, 44, 371–382. https://doi.org/10.1177/0165025419880610. Wolfe, E. W., & Smith, E. V. (2007a). Instrument development tools and activities for measure validation using Rasch models: Part I – Instrument development tools. Journal of Applied Measurement, 8, 97–123.
Wolfe, E. W., & Smith, E. V. (2007b). Instrument development tools and activities for measure validation using Rasch models: Part II – Validation activities. Journal of Applied Measurement, 8, 204–234.
Young, M. (2021, June 28). Down with meritocracy. The Guardian: Politics.
Zill, N. (1990). Behavior problems index based on parent report. Unpublished memo. Bethesda, MD: Child Trends.
Zlatkin-Troitschanskaia, O., Toepper, M., Pant, H. A., Lautenbach, C., & Kuhn, C. (Eds.). (2018). Assessment of learning outcomes in higher education: Cross-national comparisons and perspectives. Springer. https://doi.org/10.1007/978-3-319-74338-7.