当代外语研究 ›› 2026, Vol. 26 ›› Issue (1): 94-112.doi: 10.3969/j.issn.1674-8921.2026.01.008

• 语言学 • 上一篇    下一篇

当“特征”不再是符号:大语言模型引发的语言学变革

刘海涛()   

  1. 复旦大学,上海,200433
  • 出版日期:2026-02-28 发布日期:2026-03-31
  • 作者简介:

    刘海涛, 复旦大学文科资深教授。主要研究方向为数据驱动语言学、数字人文、语言规划。电子邮箱:

  • 基金资助:
    *复旦大学“数字人文和语言计量实验室”研究成果

When “Features” Cease to Be Symbols: The Linguistic Transformation Driven by Large Language Models

LIU Haitao()   

  • Online:2026-02-28 Published:2026-03-31

摘要:

本文从语言学理论中的“特征”切入,探讨大语言模型兴起背景下语言学研究范式的根本转变。传统特征合一语法依赖人工定义的离散符号系统,试图以规则演绎方式刻画语言能力;而大语言模型则通过海量文本的统计学习,隐式构建出高维、连续、上下文敏感的向量表征,实现了对语言系统的概率化建模。这种从“规则制定”到“规律发现”的转变,不仅挑战了经典语言学的认识论基础,更揭示了语言本质上是一个动态的概率系统。文章认为,面对人工智能带来的认知冲击,语言学应主动转向“数据驱动”的新范式,在解释模型所捕获的统计规律的同时,重新确立自身作为人机语言理解之间桥梁的学科使命,为中国语言学在数智时代的自主创新贡献力量。

关键词: 大语言模型, 特征表征, 语言能力, 数据驱动范式, 语言学转向

Abstract:

This paper examines the fundamental paradigm shift in linguistic research instigated by the rise of large language models (LLMs), taking the linguistic concept of “features” as its starting point. Traditional feature unification grammar relies on manually defined, discrete symbolic systems, aiming to characterize language competence through rule-based deduction. In contrast, LLMs implicitly build high-dimensional, continuous, and context-sensitive vector representations via statistical learning from massive text corpora, thereby achieving probabilistic modeling of linguistic systems. This transition from “rule-making” to “pattern discovery” not only challenges the epistemological foundations of classical linguistics but also underscores the inherent nature of language as a dynamic probabilistic system. Confronted with the cognitive impact of artificial intelligence, the paper contends that linguistics must proactively embrace a new “data-driven” paradigm. While elucidating the statistical patterns captured by these models, the field should reclaim its disciplinary mission as a bridge connecting human and machine language understanding, thereby contributing to the independent innovation of Chinese linguistics in the digital intelligence era.

Key words: Large language model, feature representation, language competence, data-driven paradigm, linguistic transformation

中图分类号: