Investigating Score Interpretations

doi:10.1017/9781108669849.006

Part II - Investigating Score Interpretations

Published online by Cambridge University Press: 14 January 2021

Edited by

Carol A. Chapelle and

Erik Voss

Show author details

Carol A. Chapelle: Affiliation:
Iowa State University
Erik Voss: Affiliation:
Teachers College, Columbia University

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 71 - 232

DOI: https://doi.org/10.1017/9781108669849.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AAS. (2012). Air traffic control. Army Aviation School.Google Scholar

Alderson, J. C. (2006). Final report on a survey of aviation English tests: Lancaster University and the European Organisation for the Safety of Air Navigation (Eurocontrol).Google Scholar

Alderson, J. C. (2009). Air safety, language assessment policy, and policy implementation: The case of Aviation English. Annual Review of Applied Linguistics, 29, 168–187. https://doi.org/10.1017/s0267190509090138 Google Scholar

Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.Google Scholar

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests (Vol. 1). Oxford: Oxford University Press.Google Scholar

Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar

Bowen, G. (2009). Document analysis as a qualitative research method. Qualitative Research Journal, 9, 27–40.CrossRef Google Scholar

Brown, J. D. (2004). Performance assessment: Existing literature and directions for research. Second Language Studies, 22(2), 91–139.Google Scholar

CAA. (2014). Manual on the English Language Proficiency Assessment (ICAO language proficiency requirements). Republic of Moldova: Civil Aviation Authority.Google Scholar

Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Interfaces between second language acquisition and language resting research (pp. 32–70). New York: Cambridge University Press.Google Scholar

Chapelle, C. A. (2008). The TOEFL validity argument. In Chapelle, C. A., Enright, M. E., & Jamieson, J. (Eds.), Building a validity argument for the Test of English as a Foreign Language (pp. 319–350). New York: Routledge.Google Scholar

Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple. Language Testing, 29(1), 19–27.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.Google Scholar

Chapelle, C. A., Schmidgall, J., Lopez, A., Blood, I., Wain, J., Cho, Y., Hutchison, A., Lee, H.-W., & Dursun, A. (2018). Designing a prototype tablet-based learning-oriented assessment for middle school English learners: An evidence-centered design approach. ETS Research Report Series, 2018(1), 1–55.Google Scholar

Corbin, J., & Strauss, A. (2008). Basics of qualitative research (3rd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar

Cushing, S. (1994). Fatal words: Communication clashes and aircraft crashes. Chicago: University of Chicago Press.Google Scholar

Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University Press.Google Scholar

Emery, H. J. (2014). Developments in LSP testing 30 years on? The case of aviation English. Language Assessment Quarterly, 11(2), 198–215.Google Scholar

Hamp-Lyons, L., & Lumley, T. (2001). Assessing language for specific purposes. Thousand Oaks, CA: Sage Publications.CrossRef Google Scholar

Hines, S. (2010). Evidence-centered design: The TOEIC speaking and writing tests. The Research Foundation for TOEIC: A Compendium of Studies, 7.1.Google Scholar

Kane, M. (2013). Articulating a validity argument. In The Routledge handbook of language testing (pp. 48–61). London: Routledge.Google Scholar

Krosnick, J. A., Narayan, S. S., & Smith, W. R. (1996). Satisficing in surveys: Initial evidence. New Directions for Evaluation, 70, 29–44.Google Scholar

Long, M. H., & Norris, J. M. (2000). Task-based teaching and assessment. Encyclopedia of Language Teaching, 597–603.Google Scholar

McNamara, T. F. (1996). Measuring second language performance. Harlow: Longman.Google Scholar

Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 60–68). New York: Macmillan.Google Scholar

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 13–23.CrossRef Google Scholar

Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 257–305). Westport, CT: American Council on Education/Praeger Publishers.Google Scholar

Mislevy, R. J. (2013). Evidence-centered design for simulation-based assessment. Military Medicine, 178(Suppl_10), 107–114.Google Scholar

Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Research Report Series, 2003(1), i–29.CrossRef Google Scholar

Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence‐centered design for educational testing. Educational Measurement: Issues and Practice, 25(4), 6–20.CrossRef Google Scholar

Mislevy, R. J., & Steinberg, L. S. (2003). Focus article: On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 3–62.Google Scholar

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment. Language Testing, 19(4), 477–496.CrossRef Google Scholar

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.Google Scholar

Mislevy, R. J., Steinberg, L., Almond, R. G., & Lucas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 49–82). Mahwah, NJ: Lawrence Erlbaum.Google Scholar

Norris, J. M. (2001). Identifying rating criteria for task-based EAP assessment. In Hudson, T. D. & Brown, J. D. (Eds.), A focus on language test development: Expanding the language proficiency construct across a variety of tests (pp. 163–204). Honolulu: University of Hawai’i Press.Google Scholar

Norris, J. M. (2014). How do we assess task-based performance? Invited LARC/CALPER testing and assessment webinar.Google Scholar

Norris, J., Brown, J. D., Hudson, T., & Yoshioka, J. (1998). Designing second language performance assessments. Technical report 18, University of Hawaii, Honolulu.Google Scholar

Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Pearlman, M. (2008). Finalizing the test blueprint. In Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.), Building a validity argument for the Test of English as a Foreign Language™ (pp. 227–258). Philadelphia, PA: Routledge.Google Scholar

Zokić, M., Boras, D., & Lazić, N. (2012). Computer-aided Aviation English testing on example of RELTA test. Paper presented at the 2012 Proceedings of the 35th International Convention MIPRO.Google Scholar

References

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 19(4), 453–476.Google Scholar

Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar

Canale, M. (1986). The promise and threat of computerized adaptive assessment of reading comprehension. In Stansfield, C. (Ed.), Technology and language testing (pp. 30–45). Washington, DC: TESOL.Google Scholar

Chapelle, C. A. (2012). Conceptions of validity. In Fulcher, G. & Davidson, F. (Eds.), The Routledge handbook of language testing (pp. 21–33). New York: Routledge.Google Scholar

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, Language Testing, 33(2), 385–405.Google Scholar

Chapelle, C. A., & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University Press.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign Language^TM. New York: Routledge.Google Scholar

Chung, Y. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Cotos, E., & Chung, Y.-R. (2018). Domain description: Validating the interpretation of the TOEFL iBT^® speaking scores for international teaching assistant screening and certification purposes. TOEFL Research Report No. RR-85. Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/ets2.12233 Google Scholar

Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar

Elder, C., Barkhuizen, G., Knoch, U., & Randow, J. V. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37–64.Google Scholar

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory. New York: Aldine.Google Scholar

Jun, H. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535.Google Scholar

Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.Google Scholar

Kane, M. T. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135–170.Google Scholar

Kane, M. T. (2006). Validation. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.Google Scholar

Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477–499.Google Scholar

McNamara, T. F. (1996). Measuring second language performance. London: Longman.Google Scholar

Ockey, G. J. (2009). The effects of a test taker’s group members’ personalities on the test taker’s second language group oral discussion test scores. Language Testing, 26(2), 161–186.Google Scholar

Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

References

Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12(2), 238–257.Google Scholar

Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12(2), 86–107.Google Scholar

Bouwer, R., Béguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32(1), 83–100.Google Scholar

Bridges, G. (2010). Demonstrating cognitive validity of IELTS academic writing task 1. Cambridge ESOL Research Notes, 42, 24–33. Retrieved from www.cambridgeenglish.org/images/23160-research-notes-42.pdf Google Scholar

Briesch, A. M., Swaminathan, H., Welsh, M., & Chafouleas, S. M. (2014). Generalizability theory: A practical guide to study design, implementation, and interpretation. Journal of School Psychology, 52(1), 13–35.Google Scholar

Briggs, D. C. (2004). Comment: Making an argument for design validity before interpretive validity. Measurement, 2(3), 171–174.Google Scholar

Carr, N. T. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign Language^TM. New York: Routledge.Google Scholar

Choi, Y. D. (2018). Graphic-prompt tasks for assessment of academic English writing ability: An argument-based approach to investigating validity. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking: A guide for applied linguistics research. Cambridge: Cambridge University Press.Google Scholar

Creswell, J. W., & Plano Clark, V. L. (2007). Designing and conducting mixed methods research. London: Sage Publications.Google Scholar

Cumming, A. (2013). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10(1), 1–8.Google Scholar

Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. E. (2004). A teacher-verification study of speaking and writing prototype tasks for a new TOEFL. Language Testing, 21(2), 107–145.Google Scholar

Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 5–43.Google Scholar

Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessment (2nd ed.). New York: Peter Lang.Google Scholar

Farahani, D. B., & Kashanifar, F. S. (2016). Graph writing test taking strategies and performance on the task: The role of academic background. Journal of Applied Linguistics and Language Research, 3(2), 51–69.Google Scholar

Gebril, A. (2009). Score generalizability of academic writing tasks: Does one test method fit it all? Language Testing, 26(4), 507–531.Google Scholar

Gebril, A. (2010). Bringing reading-to-write and writing-only assessment tasks together: A generalizability analysis. Assessing Writing, 15(2), 100–117.Google Scholar

Hyland, K. (2006). English for academic purposes: An advanced resource book. New York: Routledge.Google Scholar

IBM Corp. (2015). IBM SPSS statistics for Macintosh (version 23.0) [Computer software]. Armonk, NY: IBM Corp.Google Scholar

IELTS. (2006). Handbook 2006. Retrieved from http://aabe.com.ua/Uploads/Files/LinkDirectory/Exams/IELTS/handbook2006.pdf Google Scholar

In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341–366.Google Scholar

Iowa State University. (2019). English placement test. Retrieved from https://apling.engl.iastate.edu/english-placement-test/Google Scholar

Jewitt, C. (2005). Multimodality, “reading”, and “writing” for the 21st century. Discourse: Studies in the Cultural Politics of Education, 26(3), 315–331.Google Scholar

Jewitt, C. (2008). Multimodality and literacy in school classrooms. Review of Research in Education, 32(1), 241–267.CrossRef Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood Publishing.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.Google Scholar

Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304.Google Scholar

Knoch, U., & Chapelle, C. A. (2017). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477–499https://doi.org/10.1177/0265532217710049 Google Scholar

Knoch, U., & Sitajalabhorn, W. (2013). A closer look at integrated writing tasks: Towards a more focused definition for assessment purposes. Assessing Writing, 18(4), 300–308.Google Scholar

Lee, Y. W., & Kantor, R. (2007). Evaluating prototype tasks and alternative rating schemes for a new ESL writing test through G-theory. International Journal of Testing, 7(4), 353–385.Google Scholar

Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan.Google Scholar

Linacre, J. M. (2014). Facets Rasch measurement computer program (version 3.71.4) [Computer software]. Chicago: Winsteps.com.Google Scholar

Mackey, A., & Gass, S. M. (2005). Second language research: Methodology and design. New York: Routledge.Google Scholar

Mickan, P., Slater, S., & Gibson, C. (2000). Study of response validity of the IELTS writing subtest. International English Language Testing System, 3, 29–48.Google Scholar

Mushquash, C., & O’Connor, B. P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods, 38(3), 542–547.Google Scholar

Ockey, G. J. (2012). Item response theory. In Fulcher, G. & Davidson, F. (Eds.), Routledge handbook of language testing (pp. 316–328). London: Routledge.Google Scholar

O’Loughlin, K., & Wigglesworth, G. (2003). Task design in IELTS academic writing task 1: The effect of quantity and manner of presentation of information on candidate writing. IELTS research report #4. Retrieved from http://search.informit.com.au/documentSummary;dn=908957733867582;res=IELHSS Google Scholar

Plakans, L. (2008). Comparing composing processes in writing-only and reading-to-write test tasks. Assessing Writing, 13(2), 111–129.Google Scholar

Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing, 22(1), 1–30.Google Scholar

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. London: Sage Publications.Google Scholar

Shin, S. Y., & Ewert, D. (2015). What accounts for integrated reading-to-write task scores? Language Testing, 32(2), 259–281.CrossRef Google Scholar

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263–287.Google Scholar

Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145–178.CrossRef Google Scholar

Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.Google Scholar

Yang, H. C. (2012a). A comparative study of composing processes in reading-and graph-based writing tasks. Language Testing in Asia, 2(3), 33.Google Scholar

Yang, H. C. (2012b). Modeling the relationships between test-taking strategies and test performance on a graph-writing task: Implications for EAP. English for Specific Purposes, 31(3), 174–187.Google Scholar

Yang, H. C. (2016). Describing and interpreting graphs: The relationships between undergraduate writer characteristics and academic graph writing performance. Assessing Writing, 28, 28–42.Google Scholar

Yu, G., Rea-Dickens, P., & Kiely, P. (2012). The cognitive processes of taking IELTS Academic Writing Task 1. IELTS research report #11. Retrieved from www.ielts.org/PDF/vol11_report_6_the_cognitive_processes.pdf Google Scholar

References

Abe, M., Kondo, Y., Kobayashi, Y., Murakami, A., & Fujiwara, Y. (July 2018). Initial findings from a longitudinal learner corpus: A year-long development of L2 speaking performance. Paper presented at the 13th Teaching and Language Corpora Conference 2018, University of Cambridge, UK.Google Scholar

ALC. (2014). ALC eigo kyoiku jittai repoto 2014: Supikingu tesuto to gakushu adobaisu gyomu wo chushinni [ALC English Education Field Survey Report 2014: Focus on a speaking test and advice-giving business operations]. Tokyo: ALC. Retrieved from www.alc.co.jp/company/report/Google Scholar

ALC Educational Research Institute. (2016). Nihonjin no eigo supikingu noryoku: Risuningu, ridingu ryoku tono kankeisei ni miru eigo unyo noryoku no jittai [English speaking proficiency of Japanese learners: From a viewpoint of relationships between listening and reading abilities]. ALC English Education Field Survey Report, Vol. 7. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20160627.pdf Google Scholar

ALC Educational Research Institute. (2018). Nihon no kokosei no eigo supikingu noryoku jittai chosa III: Koko ichinenji kara sannenji de kokosei no eigoryoku wa donoyoni henkashitaka [How senior high school students’ English speaking proficiency changed from the first year to the third year]. ALC English Education Field Survey Report, Vol. 11. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20180731.pdf Google Scholar

Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle: Cambridge Scholars Publishing.Google Scholar

Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research [Commentary]. Language Assessment Quarterly, 14, 420–431. doi:10.1080/15434303.2017.1347790Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign Language™. New York: Routledge.Google Scholar

Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage.Google Scholar

Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston, MA: Allyn & Bacon.Google Scholar

Gliner, J. A., Morgan, G. A., & Leech, N. L. (2017). Research methods in applied settings: An integrated approach to design and analysis (3rd ed.). New York: Routledge.Google Scholar

Harvill, L. M. (1991). An NCME instructional module on standard error of measurement [Instructional topics in educational measurement]. Educational Measurement: Issues and Practice, 10(2), 181–189. doi:10.1111/j.1745-3992.1991.tb00195.xGoogle Scholar

Henning, G. (1987). A guide to language testing: Development, evaluation, research. Boston, MA: Heinle & Heinle.Google Scholar

Johnson, R. C. (2012). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar

Kane, M. T. (2006). Validation. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73. doi:10.1111/jedm.12000CrossRef Google Scholar

Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35, 477–499. doi:10.1177/0265532217710049Google Scholar

Koizumi, R. (2018). Eigo yongino tesuto no erabikata to tsukaikata: Datosei no kantenkara [How we can select and use English four-skill tests: From the viewpoint of validity]. Tokyo: ALC.Google Scholar

Koizumi, R., In’nami, Y., Azuma, J. Asano, K., Agawa, T., & Eberl, D. (2015). Assessing L2 proficiency growth: Considering regression to the mean and the standard error of difference. Shiken, 19(1), 3–15. Retrieved from http://teval.jalt.org/node/16 Google Scholar

Kunnan, A. J. (2018). Evaluating language assessments. New York: Routledge.Google Scholar

Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 32–42. doi:10.1111/j.1745-3992.2008.00126.xGoogle Scholar

Marsden, E., & Torgerson, C. J. (2012). Single group, pre- and post-test research designs: Some methodological concerns. Oxford Review of Education, 38, 583–616. doi:10.1080/03054985.2012.731208Google Scholar

McManus, I. C. (2012). The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement. Medical Teacher, 34, 569–576. doi:10.3109/0142159X.2012.670318Google Scholar

Mizumoto, A., & Plonsky, L. (2016). R as a lingua franca: Advantages of using R for quantitative research in applied linguistics. Applied Linguistics, 37, 284–291. doi:10.1093/applin/amv025Google Scholar

Ogino, K. (2002). Eigo supikingu noryoku tesuto SST towa nanika [What is the Standard Speaking Test?]. In Waseda Oral Communication Research Institute Research Report (pp. 2–9). Tokyo: Waseda Oral Communication Research Institute.Google Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish Listening Exam: Test usefulness evaluation. Language Assessment Quarterly, 7, 137–159. doi:10.1080/15434301003664188Google Scholar

Riazi, A. M. (2016). The Routledge encyclopedia of research methods in applied linguistics: Quantitative qualitative, and mixed-methods research. Oxon, Oxford: Routledge.Google Scholar

Schwarz, W., & Reike, D. (2018). Regression away from the mean: Theory and examples. British Journal of Mathematical and Statistical Psychology, 71, 186–203. doi:10.1111/bmsp.12106Google Scholar

Suzuki, Y., & Koizumi, R. (in press). Using equivalent test forms in SLA pretest-posttest design research. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing. New York: Routledge.Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27, 147–170. doi:10.1177/0265532209349465Google Scholar

Zhang, Y. (2008). Repeater analyses for TOEFL iBT. Research Memorandum 08-05. Princeton, NJ: Educational Testing Service. Retrieved from www.ets.org/research/policy_research_reports/publications/report/2008/ibya Google Scholar

Zhou, Y. (2015). Comparing ratings of a face-to-face and telephone-mediated speaking test. JACET Journal, 59, 33–52. Retrieved from http://dl.ndl.go.jp/info:ndljp/pid/10501826?tocOpened=1 Google Scholar

References

Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.Google Scholar

Alderson, C., & Kremmel, B. (2013). Re-examining the content validation of a grammar test: The (im)possibility of distinguishing vocabulary and structural knowledge. Language Testing, 30(4), 535–556.Google Scholar

Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar

Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1), 101–114.Google Scholar

Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German empirical study. In Arnaud, P. J. L. & Bejoint, H. (Eds.), Vocabulary and applied linguistics (pp. 85–93). London: Macmillan.Google Scholar

Chapelle, C. A. (1994). Are C-tests valid measures for L2 vocabulary research? Second Language Research, 10(2), 157–187.Google Scholar

Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Second language acquisition and language testing interfaces. Cambridge: Cambridge University Press.Google Scholar

Chapelle, C. A. (2020). Argument-based validation in testing and assessment. Sage.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar

Cheng, W., Greaves, C., Sinclair, J. McH., & Warren, M. (2009). Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams. Applied Linguistics, 30(2), 236–252.Google Scholar

Conrad, S. M., & Biber, D. (2005). The frequency and use of lexical bundles in conversation and academic prose. Lexicographica, 20, 56–71.Google Scholar

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238.Google Scholar

Creswell, J. W., & Plano Clark, V. L., (2007). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Google Scholar

Davies, M. (2008). The corpus of contemporary American English: 425 million words, 1990–present. Retrieved from http://corpus.byu.edu/coca/Google Scholar

Durrant, P. (2009). Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28(3), 157–169.Google Scholar

Ellis, N. C. (2006). SLA: The associative cognitive CREED. In VanPatten, B., Williams, J., & Williams, A. F. (Eds.), Theories in second language acquisition: An introduction. Mahwah, NJ: Erlbaum.Google Scholar

Howarth, P. (1996). Phraseology in English academic writing: Some implications for language learning and dictionary making. Tübingen, Germany: Niemeyer.Google Scholar

Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44.CrossRef Google Scholar

Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied Linguistics, 32, 150–169.Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.Google Scholar

Karimi, N. (2011). C-test and vocabulary knowledge. Language Testing in Asia, 10(4), 7–38.Google Scholar

Laufer, B., & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16(1), 33–51.Google Scholar

Lee, D. Y. W. (2001). Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 37–72.Google Scholar

Leong, C. K., Ho, M. K., Chang, J., & Hau, K. T. (2013). Differential importance of language components in determining secondary school students’ Chinese reading literacy performance. Language Testing, 30(4), 419–439.Google Scholar

Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (pp. 13–104). New York: Macmillan.Google Scholar

Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223–242.Google Scholar

Qian, D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. Canadian Modern Language Review, 56(2), 282–307.Google Scholar

Read, J., (2016). Some key issues in post-admission language assessment. In Read, J. (Ed.), Post-admission language assessment of university students. Switzerland: Springer.Google Scholar

Revier, R. L. (2009). Evaluating a new test of whole English collocations. In Gyllstad, J. & Barfield, A. (Eds.) Researching collocations in another language: Multiple interpretations (pp. 125–138). New York: Palgrave Macmillan.Google Scholar

Roche, R., Harrington, M., Sinha, Y., & Denman, C. (2016). Vocabulary recognition skill as a screening tool in English-as-a-Lingua-France university settings. In Read, J. (Ed.), Post-admission language assessment of university students. Switzerland: Springer.Google Scholar

Römer, U. (2017). Language assessment and the inseparability of lexis and grammar: Focus on the construct of speaking. Language Testing, 34(4), 477–492.Google Scholar

Shiotsu, T., & Weir, C. J. (2007). The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance. Language Testing, 24(1), 99–128.Google Scholar

Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

West, M. (1953). A general service list of English words. London: Longman, Green.Google Scholar

References

Atkinson, D., & Ramanathan, V. (1995). Cultures of writing: An ethnographic comparison of L1 and L2 university writing/language programs. TESOL Quarterly, 29(3), 539–568.Google Scholar

Costino, K. A., & Hyon, S. (2011). Sidestepping our “scare words”: Genre as a possible bridge between L1 and L2 compositionists. Journal of Second Language Writing, 20(1), 24–44.Google Scholar

Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage.Google Scholar

Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing. Applied Linguistics, 34(1), 25–52.Google Scholar

Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., & Kantor, R. (1996). A study of writing tasks assigned in academic degree programs. TOEFL Report No. 54. Princeton, NJ: Educational Testing Service.Google Scholar

James, M. A. (2008). The influence of perceptions of task similarity/difference on learning transfer in second language writing. Written Communication, 25(1), 76–103.Google Scholar

James, M. A. (2014). Learning transfer in English-for-academic-purposes contexts: A systematic review of research. Journal of English for Academic Purposes, 14, 1–13.Google Scholar

Lee, J. (2016). Transfer from ESL academic writing to first year composition and other disciplinary courses: An assessment perspective. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Lee, J. (2019). A comparison of writing tasks in ESL writing and first-year composition courses: A case study of one US university. Language Teaching Research. https://doi.org/10.1016/S1060-3743(96)90020-X Google Scholar

Leki, I., & Carson, J. (1997). “Completely different worlds”: EAP and the writing experiences of ESL students in university courses. TESOL Quarterly, 31(1), 39–69.Google Scholar

Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 43–66.Google Scholar

Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher education. Cambridge: Cambridge University Press.Google Scholar

Redden, E. (2014, December 1). Teaching International Students. Insider Higher Ed. Retrieved from www.insidehighered.com/news/2014/12/01/increasing-international-enrollments-faculty-grapple-implications-classrooms Google Scholar

Zong, J., & Batalova, J. (2018, May 9). International Students in the United States. Migration Policy Institute. Retrieved from www.migrationpolicy.org/article/international-students-united-states Google Scholar

Book contents

Part II - Investigating Score Interpretations

Summary

Access options

References

References

References

References

References

References

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive