Identifying and Minimizing Measurement Invariance among Intersectional Groups: The Alignment Method Applied to Multi-category Items

Rachel A. Gordon; Tianxiu Wang; Hai Nguyen; Ariel M. Aloe

doi:10.1017/9781009357784

References

Achenbach System of Empirically Based Assessment (ASEBA). (n.d.). The ASEBA approach. https://aseba.org/.

Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension. American Psychologist, 63, 32–50. https://doi.org/10.1037/0003-066X.63.1.32.

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association. www.testingstandards.net/open-access-files.html.

Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495–508. https://doi.org/10.1080/10705511.2014.919210.

Asparouhov, T., & Muthén, B. (2020, December). IRT in Mplus (Version 4). www.statmodel.com/download/MplusIRT.pdf.

Asparouhov, T., & Muthén, B. (2023). Multiple group alignment for exploratory and structural equation models. Structural Equation Modeling., 30(2), 169–191 https://doi.org/10.1080/10705511.2022.2127100.

Babcock, B., & Hodge, K. J. (2020). Rasch versus classical equating in the context of small sample sizes. Educational and Psychological Measurement, 80, 499–521. https://doi.org/10.1177/0013164419878483.

Bansal, P. S., Babinski, D. E., Waxmonsky, J. G., & Waschbusch, D. A. (2022). Psychometric properties of parent ratings on the Inventory of Callous–Unemotional Traits in a nationally representative sample of 5- to 12-year-olds. Assessment, 29, 242–256. https://doi.org/10.1177/1073191120964562.

Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22, 507–526. https://doi.org/10.1037/met0000077.

Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25, 673–690. https://doi.org/10.1037/met0000253.

Benjamin, L. T. Jr. (2005). A history of clinical psychology as a profession in America (and a glimpse at its future). Annual Review of Clinical Psychology, 1, 1–30. https://doi.org/10.1146/annurev.clinpsy.1.102803.143758.

Bodenhorn, T., Burns, J. P., & Palmer, M. (2020). Change, contradiction, and the state: Higher education in greater China. The China Quarterly, 244, 903–919. https://doi.org/10.1017/S0305741020001228.

Bordovsky, J. T., Krueger, R. F., Argawal, A., & Grucza, R. A. (2019). A decline in propensity toward risk behaviors among U. S. adolescents. Journal of Adolescent Health, 65, 745–751. https://doi.org/10.1016/j.jadohealth.2019.07.001.

Boulkedid, R., Abdoul, H., Loustau, M., Sibony, O., & Alberti, C. (2011). Using and reporting the Delphi method for selecting healthcare quality indicators: A systematic review. PloS One, 6(6), e20476. https://doi.org/10.1371/journal.pone.0020476.

Bratt, C., Abrams, D., Swift, H. J., Vauclair, C. M., & Marques, S. (2018). Perceived age discrimination across age in Europe: From an ageing society to a society for all ages. Developmental Psychology, 54, 167–180. https://doi.org/10.1037/dev0000398.

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multi-model inference. Springer-Verlag.

Buss, A. H., & Perry, M. P. (1992). The aggression questionnaire. Journal of Personality and Social Psychology, 63, 452–459. https://doi.org/10.1037/0022-3514.63.3.452.

Buss, A. H., & Warren, W. L. (2000). Aggression questionnaire. WPS. www.wpspublish.com/aq-aggression-questionnaire.

Byrne, B. M., Oakland, T., Leong, F. T. L. et al. (2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3, 94–105. https://doi.org/10.1037/a0014516.

Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466. https://doi.org/10.1037/0033-2909.105.3.456.

Camilli, G. (2006). Test fairness. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 221–256). Praeger.

Cheung, G. W., & Lau, R. S. (2012). A direct comparison approach for testing measurement invariance. Organizational Research Methods, 15, 167–198. https://doi.org/10.1177/1094428111421987.

Chilisa, B. (2020). Indigenous research methodologies. Sage.

Cleveland, H. H., Wiebe, R. P., van den Oord, E. J. C. G., & Rowe, D. C. (2000). Behavior problems among children from different family structures: The influence of genetic self-selection. Child Development, 71, 733–751. https://doi.org/10.1111/1467-8624.00182.

Covarrubias, A., & Vélez, V. (2013). Critical race quantitative intersectionality: An antiracist research paradigm that refuses to “let the numbers speak for themselves.” In Lynn, M. & Dixson, A. D. (Eds.), Handbook of critical race theory in education (pp. 270–285). Routledge.

Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989 (1), Article 8, 139–167.

Crowder, M. K., Gordon, R. A., Brown, R. D., Davidson, L. A., & Domitrovich, C. E. (2019). Linking social and emotional learning standards to the WCSD Social-Emotional Competency Assessment: A Rasch approach. School Psychology Quarterly, 34, 281–295. https://doi.org/10.1037/spq0000308.

Davidov, E., Meuleman, B., Cieciuch, J., Schmidt, P., & Billiet, J. (2014). Measurement equivalence in cross-national research. Annual Review of Sociology, 40, 55–75. https://doi.org/10.1146/annurev-soc-071913-043137.

De Bondt, N., & Van Petegem, P. (2015). Psychometric evaluation of the overexcitability questionnaire-two applying Bayesian structural equation modeling (BSEM) and multiple-group BSEM-based alignment with approximate measurement invariance. Frontiers in Psychology, 6, 1963. https://doi.org/10.3389/fpsyg.2015.01963.

DeMars, C. E. (2020). Alignment as an alternative to anchor purification in DIF analyses. Structural Equation Modeling, 27, 56–72. https://doi.org/10.1080/10705511.2019.1617151.

Dorans, N. J., & Cook, L. L. (2016). Fairness in educational assessment and measurement. Routledge. https://doi.org/10.4324/9781315774527.

Duckworth, A. L., & Yeager, D. S. (2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher, 44, 237–251. https://doi.org/10.3102/0013189X15584327.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

Evers, A., Muñiz, J., Hagemeister, C. et al. (2013). Assessing the quality of tests: Revision of the European Federation of Psychologists’ Associations (EFPA) review model. Psichothema, 25, 283–291.

Finch, W. H. (2016). Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Applied Measurement in Education, 29, 30–45. https://doi.org/10.1080/08957347.2015.1102916.

Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8, 370–378. https://doi.org/10.1177/1948550617693063.

Fujimoto, K. A., Gordon, R. A., Peng, F., & Hofer, K. G. (2018). Examining the category functioning of the ECERS-R across eight datasets. AERA Open, 4, 1–16. https://doi.org/10.1177/2332858418758299.

Garcia, N. M., López, N., & Vélez, V. N. (2018). QuantCrit: Rectifying quantitative methods through critical race theory. Race Ethnicity and Education, 21, 149–157. https://doi.org/10.1080/13613324.2017.1377675.

Golinski, C., & Cribbie, R. A. (2009). The expanding role of quantitative methodologists in advancing psychology. Canadian Psychology, 50, 83–90. https://doi.org/10.1037/a0015180.

Gordon, R. A. (2015). Measuring constructs in family science: How can IRT improve precision and validity? Journal of Marriage and Family, 77, 147–176. https://doi.org/10.1111/jomf.12157.

Gordon, R. A., Crowder, M. K., Aloe, A. M., Davidson, L. A., & Domitrovich, C. E. (2022). Student self-ratings of social-emotional competencies: Dimensional structure and outcome associations of the WCSD-SECA among Hispanic and non-Hispanic White boys and girls in elementary through high school. Journal of School Psychology, 93, 41–62. https://doi.org/10.1016/j.jsp.2022.05.002.

Gordon, R. A., & Davidson, L. A. (2022). Cross-cutting issues for measuring SECs in context: General opportunities and challenges with an illustration of the Washoe County School District Social-Emotional Competency Assessment (WCSD-SECAs). In Jones, S., Lesaux, N., & Barnes, S. (Eds.), Measuring non-cognitive skills in school settings (pp. 225–251). Guilford Press.

Guttmannova, K., Szanyi, J. M., & Cali, P. W. (2008). Internalizing and externalizing behavior problem scores: Cross-ethnic and longitudinal measurement invariance of the Behavior Problem Index. Educational and Psychological Measurement, 68, 676–694. https://doi.org/10.1177/0013164407310127.

Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31, 1481–1496. https://doi.org/10.1037/pas0000731.

Hauser, R. M., & Goldberger, A. S. (1971). The treatment of unobservable variables in path analysis. Sociological Methodology, 3, 81–117. https://doi.org/10.2307/270819.

Hui, C. H., & Triandis, H. C. (1985). Measurement in cross-cultural psychology: A review and comparison of strategies. Journal of Cross-Cultural Psychology, 16, 131–152. https://doi.org/10.1177/0022002185016002001.

Hussey, I., & Hughes, S. (2020). Hidden invalidity among 15 commonly used measures in social and personality psychology. Advances in Methods and Practices in Psychological Science, 3, 166–184. https://doi.org/10.1177/2515245919882903.

Jackson, M. I., & Mare, R. D. (2007). Cross-sectional and longitudinal measurements of neighborhood experience and their effects on children. Social Science Research, 36, 590–610. https://doi.org/10.1016/j.ssresearch.2007.02.002.

Johnson, J. L., & Geisinger, K. F. (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards. American Educational Research Association. https://doi.org/10.3102/9780935302967_1.

Kim, E. S., Cao, C., Wang, Y., & Nguyen, D. T. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling, 24, 524–544. https://doi.org/10.1080/10705511.2017.1304822.

King, K. M., Pullman, M. D., Lyon, A. R., Dorsey, S., & Lewis, C. C. (2019). Using implementation science to close the gap between the optimal and typical practice of quantitative methods in clinical science. Journal of Abnormal Psychology, 128, 547–562. https://doi.org/10.1037/abn0000417.

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. Springer. https://doi.org/10.1007/978-1-4939-0317-7.

Lai, M. H. C. (2021). Adjusting for measurement noninvariance with alignment in growth modeling. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2021.1941730.

Lai, M. H. C., Liu, Y., & Tse, W. W. (2022). Adjusting for partial invariance in latent parameter estimation: Comparing forward specification search and approximate invariance methods. Behavior Research Methods, 54, 414–434. https://doi.org/10.3758/s13428-021-01560-2.

Lane, S., Raymond, M. R., & Haladyna, T. M. (2016). Handbook of test development. Routledge.

Lansford, J. E., Rothenberg, W. A., Riley, J. et al. (2021). Longitudinal trajectories of four domains of parenting in relation to adolescent age and puberty in nine countries. Child Development, 92, e493–e512. https://doi.org/10.1111/cdev.13526.

Lee, J., & Wong, K. K. (2022). Centering whole-child development in global education reform international perspectives on agendas for educational equity and quality. Routledge. https://doi.org/10.4324/9781003202714.

Lemann, N. (2000). The big test: The secret history of American meritocracy. Farrar, Straus, and Giroux.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1–55.

Liu, Y., Millsap, R. E., West, S. G. et al. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22, 486–506. https://doi.org/10.1037/met0000075.

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage.

Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). Stata Press.

Luong, R., & Flake, J. K. (2022). Measurement invariance testing using confirmatory factor analysis and alignment optimization: A tutorial for transparent analysis planning and reporting. Psychological Methods. https://doi.org/10.1037/met0000441.

Marsh, H. W., Guo, J., Parker, P. D. et al. (2018). What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods, 23, 524–545. https://doi.org/10.1037/met0000113.

McLeod, J. D., Kruttschnitt, C., & Dornfeld, M. (1994). Does parenting explain the effects of structural conditions on children’s antisocial behavior? A comparison of Blacks and Whites. Social Forces, 73, 575–604. https://doi.org/10.2307/2579822.

McLoyd, V., & Smith, J. (2002). Physical discipline and behavior problems in African American, European American, and Hispanic children: Emotional support as a moderator. Journal of Marriage and Family, 64, 40–53. https://doi.org/10.1111/j.1741-3737.2002.00040.x.

Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. The Journal of Applied Psychology, 95(4), 728–743. https://doi.org/10.1037/a0018966.

Meade, A. W., & Bauer, D. J. (2007). Power and precision in confirmatory factor analytic tests of measurement invariance. Structural Equation Modeling, 14, 611–635. https://doi.org/10.1080/10705510701575461.

Meitinger, K., Davidov, E., Schmidt, P., & Braun, M. (2020). Measurement invariance: Testing for it and explaining why it is absent. Survey Research Methods, 14, 345–349.

Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5–11. https://doi.org/10.3102/0013189X018002005.

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge. https://doi.org/10.4324/9780203821961.

Morrell, L., Collier, T., Black, P., & Wilson, M. (2017). A construct-modeling approach to develop a learning progression of how students understand the structure of matter. Journal of Research in Science Teaching, 54, 1024–1048. https://doi.org/10.1002/tea.21397.

Moss, P. A. (2016). Shifting the focus of validity for test use. Assessment in Education, 23, 1–16. https://doi.org/10.1080/0969594X.2015.1072085.

Moss, P. A., Pullin, D., Gee, J. P., & Haertel, E. H. (2005). The idea of testing: Psychometric and sociocultural perspectives. Measurement, 3, 63–83. https://doi.org/10.1207/s15366359mea0302_1.

Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17, 313–335. https://doi.org/10.1037/a0026802.

Muthén, B., & Asparouhov, T. (2013). BSEM measurement invariance analysis. Mplus Web Notes: No. 17. www.statmodel.com.

Muthén, B., & Asparouhov, T. (2014). IRT studies of many groups: The alignment method. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00978.

Muthén, B., & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociological Methods & Research, 47(4), 637–664. https://doi.org/10.1177/0049124117701488.

Nering, M., & Ostini, R. (Eds.). (2010). Handbook of polytomous item response theory models. Routledge. https://doi.org/10.4324/9780203861264.

Oakland, T., Douglas, S., & Kane, H. (2016). Top ten standardized tests used internationally with children and youth by school psychologists in 64 countries: A 24-year follow-up study. Journal of Psychoeducational Assessment, 34, 166–176. https://doi.org/10.1177/0734282915595303.

Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Sage. https://doi.org/10.4135/9781412993913.

Parcel, T. L., & Menaghan, E. G. (1988). Measuring behavioral problems in a large cross-sectional survey: Reliability and validity for children of the NLS youth. Unpublished manuscript. Columbus, OH: Center for Human Resource Research, Ohio State University.

Pokropek, A., & Pokropek, E. (2022). Deep neural networks for detecting statistical model misspecifications: The case of measurement invariance. Structural Equation Modeling, 29, 394–411. https://doi.org/10.1080/10705511.2021.2010083.

Pokropek, A., Schmidt, P., & Davidov, E. (2020). Choosing priors in Bayesian measurement invariance modeling: A Monte Carlo simulation study. Structural Equation Modeling, 27, 750–764. https://doi.org/10.1080/10705511.2019.1703708.

Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. https://doi.org/10.2307/271063.

Raju, N., Fortmann-Johnson, K. A., Kim, W. et al. (2009). The item parameter replication method for detecting differential item functioning in the polytomous DFIT framework. Applied Psychological Measurement, 33, 133–147. https://doi.org/10.1177/0146621608319514.

Raju, N.S. (1988). The area between two item characteristic curves. Psychometrika 53, 495–502. https://doi.org/10.1007/BF02294403.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197–207. https://doi.org/10.1177/014662169001400208.

Rescorla, L. A., Adams, A., Ivanova, M. Y., & International ASEBA Consortium. (2020). The CBCL/1½–5’s DSM‑ASD scale: Confirmatory factor analyses across 24 societies. Journal of Autism and Developmental Disorders, 50, 3326–3340. https://doi.org/10.1007/s10803-019-04189-5.

Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354–373. https://doi.org/10.1037/a0029315.

Rimfeld, K., Malanchini, M., Hannigan, L. J. et al. (2019). Teacher assessments during compulsory education are as reliable, stable and heritable as standardized test scores. Journal of Child Psychology and Psychiatry, 60, 1278–1288. https://doi.org/10.1111/jcpp.13070.

Rious, J. B., Cunningham, M., & Spencer, M. B. (2019). Rethinking the notion of “hostility” in African American parenting styles. Research in Human Development, 16, 35–50. https://doi.org/10.1080/15427609.2018.1541377.

Rotberg, I. C. (Ed.). (2010). Balancing change and tradition in global education reform (2nd ed.). Rowman & Littlefield.

Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selection: What does current research support? Human Resource Management Review, 16, 155–180. https://doi.org/10.1016/j.hrmr.2006.03.004.

Royston, P., Altman, D. G., & Sauerbrei, W. (2005). Dichotomizing continuous predictors in multiple regression: A bad idea. Statistics in Medicine, 25, 127–141. https://doi.org/10.1002/sim.2331.

Sablan, J. R. (2019). Can you really measure that? Combining critical race theory and quantitative methods. American Educational Research Journal, 56, 178–203. https://doi.org/10.3102/0002831218798325.

Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometrika Monograph: No. 17. https://doi.org/10.1007/BF03372160.

Samejima, F. (1996). The graded response model. In van der Linden, W. J. & Hambleton, R. K. (Eds.), Handbook of modern item response theory (pp. 85–100). Springer. https://doi.org/10.1007/978-1-4757-2691-6_5.

Samejima, F. (2010). The general graded response model. In Nering, M. L. & Ostini, R. (Eds.), Handbook of polytomous item response theory models (pp. 77–107). Routledge.

Santori, D. (2020). Test-based accountability in England. Oxford Research Encyclopedias. Oxford University Press. https://doi.org/10.1093/acrefore/9780190264093.013.1454.

Seddig, D., & Lomazzi, V. (2019). Using cultural and structural indicators to explain measurement noninvariance in gender role attitudes with multilevel structural equation modeling. Social Science Research, 84, 102328. https://doi.org/10.1016/j.ssresearch.2019.102328.

Sestir, M. A., Kennedy, L. A., Peszka, J. J., & Bartley, J. G. (2021). New statistics, old schools: An overview of current introductory undergraduate and graduate statistics pedagogy practices. Teaching of Psychology. https://doi.org/10.1177/00986283211030616.

Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18, 572–582. https://doi.org/10.1037/a0034177.

Shute, R. H., & Slee, P. T. (2015). Child development theories and critical perspectives. Routledge.

Sirganci, G., Uyumaz, G., & Yandi, A. (2020). Measurement invariance testing with alignment method: Many groups comparison. International Journal of Assessment Tools in Education, 7, 657–673. https://doi.org/10.21449/ijate.714218.

Spencer, M. S., Fitch, D., Grogan-Taylor, A., & Mcbeath, B. (2005). The equivalence of the behavior problem index across U.S. ethnic groups. Journal of Cross-Cultural Psychology, 36(5), 573–589. https://doi.org/10.1177/0022022105278

Sprague, J. (2016). Feminist methodologies for critical researchers: Bridging differences. Rowman & Littlefield.

Strobl, C., Kopf, J., Kohler, L., von Oertzen, T., & Zeileis, A. (2021). Anchor point selection: Scale alignment based on an inequality criterion. Applied Psychological Measurement, 45, 214–230. https://doi.org/10.1177/0146621621990743.

Studts, C. R., Polaha, J., & van Zyl, M. A. (2017). Identifying unbiased items for screening preschoolers for disruptive behavior problems. Journal of Pediatric Psychology, 42, 476–486. https://doi.org/10.1093/jpepsy/jsw090.

Svetina, D., Rutkowski, L., & Rutkowski, D. (2020). Multiple-group invariance with categorical outcomes using updated guidelines: An illustration using Mplus and the lavaan/semtools packages. Structural Equation Modeling, 27, 111–130. https://doi.org/10.1080/10705511.2019.1602776.

Tay, L., Meade, A. W., & Cao, M. (2014). An overview and practice guide to IRT measurement equivalence analysis. Organizational Research Methods, 18, 3–46. https://doi.org/10.1177/1094428114553062.

Walter, M., & Andersen, C. (2016). Indigenous statistics: A quantitative research methodology. Routledge. https://doi.org/10.4324/9781315426570.

Wen, C., & Hu, F. (2022). Investigating the applicability of alignment: A Monte Carlo simulation study. Frontiers in Psychology, 13, 845721. https://doi.org/10.3389/fpsyg.2022.845721.

West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection in structural equation modeling. In Hoyle, R. H. (Ed.), Handbook of structural equation modeling (pp. 209–231). Guilford Press.

Winter, S. D., & Depaoli, S. (2020). An illustration of Bayesian approximate measurement invariance with longitudinal data and a small sample size. International Journal of Behavioral Development, 44, 371–382. https://doi.org/10.1177/0165025419880610.

Wolfe, E. W., & Smith, E. V. (2007a). Instrument development tools and activities for measure validation using Rasch models: Part I – Instrument development tools. Journal of Applied Measurement, 8, 97–123.

Wolfe, E. W., & Smith, E. V. (2007b). Instrument development tools and activities for measure validation using Rasch models: Part II – Validation activities. Journal of Applied Measurement, 8, 204–234.

Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44, 1–27. https://doi.org/10.1080/00273170802620121

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27, 147–170. https://doi.org/10.1177/0265532209349465.

Yoon, M., & Lai, M. H. C. (2018). Testing factorial invariance with unbalanced samples. Structural Equation Modeling, 25, 201–213. https://doi.org/10.1080/10705511.2017.1387859.

Young, M. (2021, June 28). Down with meritocracy. The Guardian: Politics.

Zill, N. (1990). Behavior problems index based on parent report. Unpublished memo. Bethesda, MD: Child Trends.

Zlatkin-Troitschanskaia, O., Toepper, M., Pant, H. A., Lautenbach, C., & Kuhn, C. (Eds.). (2018). Assessment of learning outcomes in higher education: Cross-national comparisons and perspectives. Springer. https://doi.org/10.1007/978-3-319-74338-7.

Identifying and Minimizing Measurement Invariance among Intersectional Groups

The Alignment Method Applied to Multi-category Items

This Element has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

References

Metrics

Altmetric attention score

Full text views

Book summary page views