评分员视角下外语写作测试分项评分标准的界定

doi:10.3969/j.issn.1674-8921.2022.04.013

摘要/Abstract

摘要：

近年来,大规模外语写作测试中采用的评分标准引起了普遍关注,很多研究者一致认为评分标准代表了写作测试实际测量的构念。鉴于此,本研究以大学英语四级写作测试(简称CET-4写作测试)为例,探索适用于CET-4写作测试的评分标准。在理论回顾和文献分析的基础上,本文初步归纳出可能适用于CET-4写作测试的评分标准,然后采用混合研究方法,借助问卷和访谈调查了评分员对这些评分标准的意见。研究结果表明:除了“任务的完成度”这项评分标准之外,其余九项评分标准在CET-4写作测试的评分中都比较有效,而且这些评分标准也基本包含在CET-4写作测试目前的构念框架中,说明这些评分标准符合CET-4写作测试的理论构念要求。本研究从理论上和方法上对于界定大规模外语写作测试的构念,以及检验评分量表的效度都具有一定的启示意义。

关键词: 外语写作测试, 评分标准, 构念, CET-4写作测试, 混合研究方法, 效度

Abstract:

In recent years, the rating criteria adopted in large-scale EFL writing assessments have received increasing research attention due to the widespread consensus that rating criteria represent the de-facto test construct of writing assessment. As such, this study was conducted to pinpoint the most useful rating criteria for the writing components of College English Test Band Four (CET-4 writing). Relying on a Mixed-methods approach, the study investigated how CET-4 raters would perceive the usefulness of a set of rating criteria elicited on the basis of atheoretical and literature review. The results showed that all the rating criteria were perceived to be useful except for one criterion —task fulfillment. Given that the remaining criteria were relevant to the construct components of CET-4 writing, we could have confidence in their representativenss of the construct validity of CET-4 writing. Meanwhile, the study also found that the proficiency levels of CET-4 writing performance could significantly impact raters’ perception of the usefulness of the rating criteria. This, to some extent, could pose challenge to the validity of the holistic scoring approach adopted by CET-4 writing. In conclusion, this study can provide some theoretical and methodological implications for future research indelineating the construct components of large-scale EFL assessments, as well as examining the validity of the existing rating scales adopted by large-scale EFL assessments.

Key words: EFL writing assessments, rating criteria, construct, CET-4 writing, Mixed-methods approach, test validity

中图分类号:

H319

邹绍艳, 范劲松. 评分员视角下外语写作测试分项评分标准的界定[J]. 当代外语研究, 2022, 22(4): 133-143.

ZOU Shaoyan, FAN Jingsong. Pinpointing Analytic Rating Criteria for EFL Writing Assessment from Raters’ Perspectives[J]. Contemporary Foreign Languages Studies, 2022, 22(4): 133-143.

图/表 2

表1

表2

参考文献 32

[1]	Bachman L. F. 1990. Fundamental Considerations in Language Testing[M]. Oxford: Oxford University Press.
[2]	Barrett S. 2001. The impact of training on rater variability[J]. International Education Journal (1) : 49-58.
[3]	Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences(2nd ed.)[M]. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
[4]	Creswell J. W. & J. D. Creswell. 2017. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (5th ed.)[M]. London: Sage Publications.
[5]	Cumming A., R. Kantor& D. Powers. 2002. Decision making while rating ESL/EFL writing tasks: A descriptive framework[J]. The Modern Language Journal 86: 67-96. doi: 10.1111/1540-4781.00137 URL
[6]	Eckes T. 2008. Rater types in writing performance assessment: A classification approach to rater variability[J]. Language Testing 25: 155-185. doi: 10.1177/0265532207086780 URL
[7]	Ellis R. 2003. Task-based Language Learning and Teaching[M]. Oxford: Oxford University Press.
[8]	Ellis R. 2008. The Study of Second Language Acquisition(2nd ed.)[M]. Oxford: Oxford University Press.
[9]	Fulcher G. 2003. Testing Second Language Speaking[M]. London: Pearson Education.
[10]	Grabe W. & R.B. Kaplan. 1996. Theory and Practice of Writing[M]. New York: Longman.
[11]	Housen A. & F. Kuiken. 2009. Complexity, accuracy, and fluency in second language acquisition[J]. Applied linguistics 30(4): 461-473. doi: 10.1093/applin/amp048 URL
[12]	Howell D. C.. 2016. Fundamental Statistics for the Behavioral Sciences[M]. Belmont: Nelson Education.
[13]	Huot B. A.. 1993. The influence of holistic scoring procedures on reading and rating student essays[A]. In M. M. Williamson & B. A. Huot (eds.). Validating Holistic Scoring for Writing Assessment: Theoretical and Empirical Foundations [C]. Cresskill, NJ: Hampton Press. 206-236.
[14]	Knoch U. 2009. Diagnostic Writing Assessment: The Development and Validation of a Rating Scale[M]. Frankfurt, Germany: Peter Lang.
[15]	Lumley T. 2005. Assessing Second Language Writing: The Rater’s Perspective[M]. New York: Peter Lang.
[16]	Luoma S. 2004. Assessing Speaking[M]. Cambridge: Cambridge University Press.
[17]	McNamara T. F.. 1996. Measuring Second Language Performance[M]. London and New York: Longman.
[18]	Messick S. 1995. Standards of validity and the validity of standards in performance assessment[J]. Educational Measurement: Issues and Practice (14): 5-8.
[19]	Milanovic M., N. Saville& S. Shuhong. 1996. A study of the decision-making behaviour of composition markers[J]. Studies in Language Testing (3): 92-111.
[20]	Shaw S. D. & C. J. Weir. 2007. Examining Writing: Research and Practice in Assessing Second Language Writing[M]. Cambridge: Cambridge University Press.
[21]	Skehan P. 1998. A Cognitive Approach to Language Learning[M]. Oxford: Oxford University Press.
[22]	Stratman J. & L. Hamp-Lyons. 1994. Reactivity in concurrent think-aloud protocols:issues for research[A]. In P. Smagorinsky (eds.). Speaking about Writing: Reflections on Research Methodology [C]. Thousand Oaks, CA: Sage. 89-114.
[23]	Weigle S. C. 2002. Assessing Writing[M]. Cambridge: Ernst KlettSprachen.
[24]	Wolfe E. W., C. W. Kao& M. Ranney. 1998. Cognitive differences in proficient and non-proficient essay scorers[J]. Written Communication 15(4): 465-492. doi: 10.1177/0741088398015004002 URL
[25]	费茜、赵毓琴. 2008. 大学英语四级写作评分标准中的问题分析[J]. 外语教学理论与实践(4): 45-52.
[26]	辜向东、杨志强. 2009. CET写作试题20年分析与研究[J]. 外语与外语教学(6):21-26.
[27]	李清华. 2014. 高校英语专业四级测试写作评分标准的设计与效度研究[M]. 北京: 科学出版社.
[28]	刘力、麦陈淑贤、金檀. 2013. 写作测试内容质量评分研究——分层决策树法[J]. 现代外语(4):419-426.
[29]	王跃武、朱正才、杨惠中. 2006. 作文网上评分信度的多面Rasch测量分析[J]. 外语界(1):69-76.
[30]	张森、于朋. 2010. 大学英语四级考试作文网上评阅信度保障研究[J]. 外语界(5):79-86.
[31]	邹绍艳、潘鸣威. 2018. 《中国英语能力等级量表》的写作能力构念界定[J]. 当代外语研究(5):62-72.
[32]	邹绍艳、范劲松. 2019. 大学英语四级写作评分量表的效度初探——基于评分员的视角[J]. 外国语文(3):148-156.

评分标准	低水平CET-4 作文	中等水平CET-4 作文	高水平CET-4 作文	总均值
评分标准	均值(标准差)	均值(标准差)	均值(标准差)	均值(标准差)
词汇的广度	2.13 (1.05)	2.80 (0.81)	3.79 (0.46)	2.90 (1.06)
句法的复杂度	2.06 (1.06)	2.79 (0.77)	3.79 (0.42)	2.87 (1.07)
语言的准确度	2.63 (1.08)	3.12 (0.68)	3.89 (0.32)	3.21 (0.92)
语言的得体性	2.35 (1.02)	2.87 (0.78)	3.71 (0.52)	2.97 (0.98)
内容和思想	2.49 (1.00)	2.98 (0.69)	3.79 (0.48)	3.08 (0.92)
衔接与连贯	2.34 (1.00)	2.90 (0.70)	3.78 (0.44)	3.00 (0.95)
篇章组织	2.32 (0.98)	2.92 (0.78)	3.70 (0.51)	2.97 (0.96)
任务完成度	2.67 (1.02)	3.06 (0.73)	3.80 (0.48)	3.17 (0.91)
写作的规范性	2.63 (1.01)	3.06 (0.73)	3.74 (0.45)	3.14 (0.89)
作文长度	2.42 (1.07)	3.01 (0.73)	3.74 (0.56)	3.05 (0.98)

评分标准	低水平	中等水平	高水平	方差分析结果
评分标准	均值(SD)	均值(SD)	均值(SD)	Wilks’ Lambda	F	Sig.	Partial eta squared
词汇的广度	2.13 (1.05)	2.80 (0.81)	3.79 (0.46)	0.274	228	0.00	0.726
句法的复杂度	2.07 (1.07)	2.78 (0.77)	3.79 (0.42)	0.238	275.18	0.00	0.762
语言的准确度	2.63 (1.08)	3.12 (0.68)	3.89 (0.32)	0.380	140.3	0.00	0.620
语言的得体性	2.34 (1.02)	2.88 (0.78)	3.71 (0.52)	0.342	164.2	0.00	0.658
内容和思想	2.49 (1.00)	2.99 (0.69)	3.79 (0.48)	0.336	168.8	0.00	0.664
衔接与连贯	2.33 (0.99)	2.90 (0.70)	3.78 (0.44)	0.291	209.6	0.00	0.709
篇章组织	2.31 (0.98)	2.92 (0.78)	3.71 (0.49)	0.352	157.1	0.00	0.648
任务完成度	2.68 (1.03)	3.06 (0.74)	3.80 (0.48)	0.448	105.2	0.00	0.552
写作的规范性	2.62 (1.01)	3.05 (0.72)	3.74 (0.45)	0.449	105.4	0.00	0.551
作文长度	2.42 (1.07)	3.01 (0.72)	3.75 (0.55)	0.378	140.9	0.000	0.622