Please wait a minute...
IMAGE/TABLE DETAILS
An Empirical Study on AI-Driven Evaluation of Student Translations Using Large Language Models
ZHANG Jing, PENG Sirui
Contemporary Foreign Languages Studies    2025, 25 (5): 85-96.   DOI: 10.3969/j.issn.1674-8921.2025.05.009
Abstract   (85 HTML1 PDF(pc) (1335KB)(36)  

This study investigates the application of large language models (LLMs) in translation teaching, focusing on their effectiveness and limitations in assessing student translations. Using established standards for human translation quality evaluation, a two-tier analytical framework combining quantitative and qualitative analyses was developed, incorporating human scores, LLM-generated scores, and evaluative comments. Quantitative results indicated that the LLM performed reliably in structural dimensions of Chinese-to-English tasks, but a marked decline was observed in semantic and cultural dimensions in English-to-Chinese tasks, exposing weaknesses in deep semantic understanding and cultural adaptation. Qualitative analysis further revealed issues such as templated feedback, misattribution of errors, and rejection of creative translation, thereby corroborating the quantitative findings. Based on these results, the study proposes a human-AI collaborative pathway for teaching practice, highlighting the role of LLMs as auxiliary tools for structural checking while reserving semantic and cultural evaluation for teachers. The findings provide theoretical grounding and pedagogical implications for the integration of intelligent assessment in translation education, and suggest a shift in LLMs’ role from instrumental support to cognitive collaboration.


维度 方向 绝对一致性 趋势关联
ICC值 95% CI 显著性(p) 相关性(r) 显著性(p)
1.自然与清晰 英→汉 0.695c [0.44, 0.83] <0.01** 0.590** <0.001
汉→英 0.224c [-0.17, 0.52] <0.01** 0.341** 0.005
2.语法与词汇 英→汉 0.537c [0.25, 0.71] <0.01** 0.512** <0.001
汉→英 0.204c [-0.16, 0.47] 0.1 0.199 0.106
3.意图与背景 英→汉 0.255c [-0.18, 0.56] <0.01** 0.399** 0.001
汉→英 -0.019c [-0.12, 0.11] 0.63 -0.046 0.713
4.内容与信息 英→汉 0.632c [0.40, 0.77] <0.01** 0.498** <0.001
汉→英 0.323c [-0.16, 0.61] <0.01** 0.348** 0.004
5.文化与术语 英→汉 0.802c [0.67, 0.88] <0.01** 0.692** <0.001
汉→英 0.598c [0.35, 0.75] <0.01** 0.434** <0.001
总分 英→汉 0.811c [0.69, 0.88] <0.01** 0.733** <0.001
汉→英 0.257c [-0.16, 0.54] p=0.01* 0.339** 0.005
表3 人机评分一致性对比
Other Images/Table from this Article