Please wait a minute...
IMAGE/TABLE DETAILS
An Empirical Study on AI-Driven Evaluation of Student Translations Using Large Language Models
ZHANG Jing, PENG Sirui
Contemporary Foreign Languages Studies    2025, 25 (5): 85-96.   DOI: 10.3969/j.issn.1674-8921.2025.05.009
Abstract   (91 HTML1 PDF(pc) (1335KB)(38)  

This study investigates the application of large language models (LLMs) in translation teaching, focusing on their effectiveness and limitations in assessing student translations. Using established standards for human translation quality evaluation, a two-tier analytical framework combining quantitative and qualitative analyses was developed, incorporating human scores, LLM-generated scores, and evaluative comments. Quantitative results indicated that the LLM performed reliably in structural dimensions of Chinese-to-English tasks, but a marked decline was observed in semantic and cultural dimensions in English-to-Chinese tasks, exposing weaknesses in deep semantic understanding and cultural adaptation. Qualitative analysis further revealed issues such as templated feedback, misattribution of errors, and rejection of creative translation, thereby corroborating the quantitative findings. Based on these results, the study proposes a human-AI collaborative pathway for teaching practice, highlighting the role of LLMs as auxiliary tools for structural checking while reserving semantic and cultural evaluation for teachers. The findings provide theoretical grounding and pedagogical implications for the integration of intelligent assessment in translation education, and suggest a shift in LLMs’ role from instrumental support to cognitive collaboration.


模块 构成要素 具体内容示例
身份定义 角色定位 作为专业翻译质量评估系统,您需具备语言学与翻译学双重专业背景,严格依据既定标准执行多维度质量评测
任务说明 评估维度 自然度与清晰度、文化术语准确性、语法规范性、意图忠实性、内容完整性(上传各维度具体指标描述)
评分规则 各维度10分制(0—10分),每2分差代表显著水平差异,总分加权求和
操作流程 1.文化术语定位→词表比对→错误计数
2.语法检测→错误分类加权
3.意图分析→语义相似度计算
4.信息完整性→双向匹配验证
(上传词表和参考译文)
能力要求 核心技能 ·跨文化交际能力
·语法错误模式识别
·术语数据库检索
技术工具 ·预训练语言模型(流畅度分析)
·句法分析器(复杂度检测)
·词向量模型(词汇适切性)
·命名实体识别(事实准确性)
操作限制 约束条件 ·单次错误不重复扣分
·文化术语以提供词表为基准,但不限于词表
·必须输出扣分依据与改进建议
·禁止任何创造性解释
表1 四元结构零样本提示模板
Other Images/Table from this Article