Standard Setting in Academic Writing Assessment through Objective Standard Setting Method
Keywords:Academic writing assessment, Many-facet Rasch measurement, Standard setting, OSS
Performance standards have important consequences for all the stakeholders in the assessment of L2 academic writing. These standards not only describe the level of writing performance but also provide a basis for making evaluative decisions on the academic writing. Such a high-stakes role of the performance standards requires the enhancement of objectivity in standard setting procedure. Accordingly, this study aims to shed light upon the usefulness of Objective Standard Setting (OSS) method in specifying the levels of proficiency in L2 academic writing. On the basis of the descriptive research design, the sample of this research includes the examinees and raters who were student teachers at the university level. Essay task and analytical writing scoring rubric were employed as the data collection tools. In data analysis, OSS method and two-step cluster analysis were used. The analysis results of OSS method based on many-facet Rasch measurement model (MFRM) outline the distribution of the criteria into the levels of proficiency. Also, the main findings in OSS method were validated with two-step cluster analysis. That is, OSS method may be practically used to help the stakeholders make objective judgments on the examinees’ target performance.
Bejar, I.I. (2008). Standard setting: What is it? Why is it important? R&D Connections, 7, 1-6.
Best, J.W., & Khan, J.V. (2006). Research in Education (10th Edition). Pearson.
Bichi, A.A., Talib, R., Embong, R., Mohamed, H. B., Ismail, M. S., & Ibrahim, A. (2019). Rasch-based objective standard setting for university placement test. Eurasian Journal of Educational Research, 19(84), 57-70. https://doi.org/10.14689/ejer.2019.84.3
Chen, W.H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.3102/10769986022003265
Cizek, G.J. (1993). Reconsidering standards and criteria. Journal of Educational Measurement, 30(2), 93-106. https://doi.org/10.1111/j.1745-3984.1993.tb01068.x
Cizek, G.J. (Ed.). (2012). An introduction to contemporary standard setting: concepts, characteristics, and concepts. In Setting performance standards: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.
Council of Europe [CoE]. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge, England: Cambridge University Press.
Cokluk, O., Sekercioglu, G., & Buyukozturk, S. (2012). Sosyal bilimler icin cok degiskenli istatistik: SPSS ve LISREL uygulamalari (2nd edition) [Multivariate statistics for social sciences: SPSS and LISREL applications], Pegem Akademi.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T. (1999). Studies in language testing 7: Dictionary of language testing. Cambridge University Press.
Davis-Becker, S.L., Buckendahl, C.W., & Gerrow, J. (2011). Evaluating the bookmark standard setting method: The impact of random item ordering. International Journal of Testing, 11(1), 24-37. https://doi.org/10.1080/15305058.2010.501536
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 37-64. https://doi.org/10.1177/0265532207071511
Erkus, A., Sunbul, O., Omur-Sunbul, S., Yormaz, S., & Asiret, S. (2017). Psikolojide olcme ve olcek gelistirme-II (1st edition) [Measurement in psychology and scale development-II], Pegem Akademi.
Fleckenstein, J., Keller, S., Krüger, M., Tannenbaum, R.J., & Köller, O. (2020). Linking TOEFL iBT® writing rubrics to CEFR levels: Cut scores and validity evidence from a standard setting study. Assessing Writing, 43, 1 15. https://doi.org/10.1016/j.asw.2019.100420
Fulcher, G. (2013). Practical language testing. Routledge. https://doi.org/10.4324/980203767399
Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30, 21-31. https://doi.org/10.1016/j.asw.2016.07.004
Green, A. (2018). Linking tests of English for academic purposes to the CEFR: The score user’s perspective. Language Assessment Quarterly, 15(1), 59 74. https://doi.org/10.1080/15434303.2017.1350685
Harsch, C., & Rupp, A.A. (2011). Designing and scaling level-specific writing tasks in alignment with the CEFR: A test-centered approach. Language Assessment Quarterly, 8(1), 1-33. https://doi.org/10.1080/15434303.2010.535575
Hsieh, M. (2013). An application of multifaceted Rasch measurement in the Yes/No Angoff standard setting procedure. Language Testing, 30(4), 491 512. https://doi.org/10.1177/0265532213476259
IELTS (The Internatinal English Language Testing System). https://www.ielts.org/
Kayri, M. (2007). Two-step clustering analysis in researches: A case study. Eurasian Journal of Educational Research (EJER), 28, 89-99.
Khalid, M. N. (2011). Cluster analysis-a standard setting technique in measurement and testing. Journal of Applied Quantitative Methods, 6(2), 46-58.
Khatimin, N., Aziz, A.A., Zaharim, A., & Yasin, S.H.M. (2013). Development of objective standard setting using Rasch measurement model in Malaysian institution of higher learning. International Education Studies, 6(6), 151 160. https://doi.org/10.5539/ies.v6n6p151
Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Linacre, J.M. (2017). A user’s guide to FACETS: Rasch-model computer programs. Chicago: MESA Press.
Livingston, S.A., & Zieky, M.J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service: New Jersey.
McDonald, R.P. (1999). Test theory: A unified approach. Mahwah, NJ: Erlbaum.
MacDougall, M., & Stone, G.E. (2015). Fortune-tellers or content specialists: Challenging the standard setting paradigm in medical education programmes. Journal of Contemporary Medical Education, 3(3), 135. https://doi.org/10.5455/jcme.20151019104847
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing, 25(4), 465-493. https://doi.org/10.1177/0265532208094273
Shin, S.Y., & Lidster, R. (2017). Evaluating different standard-setting methods in an ESL placement testing context. Language Testing, 34(3), 357 381. https://doi.org/10.1177/0265532216646605
Sireci, S.G., Robin, F., & Patelis, T. (1997). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12(3), 301 325. https://doi.org/10.1207/S15324818AME1203_5
Sondergeld, T.A., Stone, G.E., & Kruse, L.M. (2020). Objective standard setting in educational assessment and decision making. Educational Policy, 34(5), 735-759. https://doi.org/10.1177/0895904818802115
Stone, G.E. (2001). Objective standard setting (or truth in advertising). Journal of Applied Measurement, 2(2), 187-201.
Stone, G.E., Koskey, K.L., & Sondergeld, T.A. (2011). Comparing construct definition in the Angoff and Objective Standard Setting models: Playing in a house of cards without a full deck. Educational and Psychological Measurement, 71(6), 942 962. https://doi.org/10.1177/0013164410394338
Sata, M. & Karakaya, I. (2021). Investigating the effect of rater training on differential rater function in assessing academic writing skills of higher education students. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 163 181. https://doi.org/10.21031/epod.842094
Tannenbaum, R.J., & Wylie, E.C. (2008). Linking English‐language test scores onto the common European framework of reference: An application of standard‐setting methodology. ETS Research Report Series, 2008(1), i-75. https://doi.org/10.1002/j.2333-8504.2008.tb02120.x
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3 22. https://doi.org/10.1177/0265532215594830
Violato, C., Marini, A., & Lee, C. (2003). A validity study of expert judgment procedures for setting cutoff scores on high-stakes credentialing examinations using cluster analysis. Evaluation & The Health Professions, 26(1), 59 72. https://doi.org/10.1177/0163278702250082
Weigle, S.C. (2002). Assessing writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511732997
Wilson, F.R., Pan, W., & Schumsky, D.A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286
Wind, S.A., & Engelhard Jr, G. (2013). How invariant and accurate are domain ratings in writing assessment? Assessing Writing, 18(4), 278-299.
Wright, B.D., & Grosse M. (1993). How to set standards. Rasch Measurement Transactions, 7(3), 315-316.
Yudkowsky, R., Downing, S. M., & Tekian, A. (2009). Standard setting. In R. Yudkowsky & S. Downing (Ed.), Assessment in health professions education (pp. 86-105). Routledge. https://doi.org/10.4324/9781315166902-6
How to Cite
Copyright (c) 2022 International Journal of Assessment Tools in Education
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.