当“特征”不再是符号:大语言模型引发的语言学变革

doi:10.3969/j.issn.1674-8921.2026.01.008

摘要/Abstract

摘要：

本文从语言学理论中的“特征”切入,探讨大语言模型兴起背景下语言学研究范式的根本转变。传统特征合一语法依赖人工定义的离散符号系统,试图以规则演绎方式刻画语言能力;而大语言模型则通过海量文本的统计学习,隐式构建出高维、连续、上下文敏感的向量表征,实现了对语言系统的概率化建模。这种从“规则制定”到“规律发现”的转变,不仅挑战了经典语言学的认识论基础,更揭示了语言本质上是一个动态的概率系统。文章认为,面对人工智能带来的认知冲击,语言学应主动转向“数据驱动”的新范式,在解释模型所捕获的统计规律的同时,重新确立自身作为人机语言理解之间桥梁的学科使命,为中国语言学在数智时代的自主创新贡献力量。

关键词: 大语言模型, 特征表征, 语言能力, 数据驱动范式, 语言学转向

Abstract:

This paper examines the fundamental paradigm shift in linguistic research instigated by the rise of large language models (LLMs), taking the linguistic concept of “features” as its starting point. Traditional feature unification grammar relies on manually defined, discrete symbolic systems, aiming to characterize language competence through rule-based deduction. In contrast, LLMs implicitly build high-dimensional, continuous, and context-sensitive vector representations via statistical learning from massive text corpora, thereby achieving probabilistic modeling of linguistic systems. This transition from “rule-making” to “pattern discovery” not only challenges the epistemological foundations of classical linguistics but also underscores the inherent nature of language as a dynamic probabilistic system. Confronted with the cognitive impact of artificial intelligence, the paper contends that linguistics must proactively embrace a new “data-driven” paradigm. While elucidating the statistical patterns captured by these models, the field should reclaim its disciplinary mission as a bridge connecting human and machine language understanding, thereby contributing to the independent innovation of Chinese linguistics in the digital intelligence era.

Key words: Large language model, feature representation, language competence, data-driven paradigm, linguistic transformation

中图分类号:

刘海涛. 当“特征”不再是符号:大语言模型引发的语言学变革[J]. 当代外语研究, 2026, 26(1): 94-112.

LIU Haitao. When “Features” Cease to Be Symbols: The Linguistic Transformation Driven by Large Language Models[J]. Contemporary Foreign Languages Studies, 2026, 26(1): 94-112.

图/表 4

图1

图2

表1

特征合一语法与大语言模型的比较

	特征合一语法	大语言模型
本质与形式	离散的、符号化的	连续的、数值化的向量(嵌入)
特征维度	低维(通常<10)	高维(通常>1000)
特征交互	确定性规则(合一运算)	概率性关联(权重调整)
产生方式	人工特征工程。需要专家根据语言学知识精心设计和选择特征,然后编写规则来提取。	自动学习。模型在大量文本数据上通过预训练过程自行学习到如何将语言元素表示为向量。无需人工设计。
含义与可解释性	高可解释性。每个特征都有明确的语言学意义。	低可解释性。单个维度(向量中的某个数字)通常没有明确含义。其语义信息分布式地隐藏在整个向量中。
上下文依赖性	大部分是上下文无关的。例如,“apple”这个词的“词形”特征就是“apple”本身,无论它指的是水果还是公司。需要额外设计特征来捕捉上下文。	高度上下文相关。会根据上下文为同一个词形生成不同的向量表示。例如,“apple pie”和“Apple Inc.”中的“apple”,其向量表示会完全不同,精准捕捉了上下文语义。
知识与信息来源	来自语言学家的先验知识和词典、规则库。	来自训练数据中的统计模式。模型从海量文本中隐式地学习语法、常识甚至推理能力。
粒度与层次	特征通常是分层次、模块化的:先有词法特征,再有句法特征,最后是语义特征。	端到端的。模型同时学习所有层次的特征(从字符、词法、句法到语义、语用),所有信息都混合在最后的向量表示中。
迁移性与泛化能力	差。为特定任务精心设计的特征很难直接用于另一个任务。	极强。预训练好的通用向量表示可以作为一个强大的特征基础,通过微调或少样本学习轻松迁移到各种下游任务中。

表1

表2

参考文献 32

[1]	Bod, R., J. Hay & S. Jannedy. 2003. Probabilistic Linguistics[C]. Cambridge: MIT Press.
[2]	Divjak, D. 2019. Frequency in Language: Memory, Attention and Learning[M]. Cambridge: Cambridge University Press.
[3]	Hausser, R. 1999. Foundations of Computational Linguistics: Human-Computer Communication in Natural Language[M]. Berlin, Heidelberg & New York: Springer.
[4]	Hausser, R. 2011. Computational Linguistics and Talking Robots: Processing Content in Database Semantics[M]. Berlin, Heidelberg, New York: Springer.
[5]	Hausser, R. 2017. Generalized reference: Referring with and without language by matching, pointer, or address[A]. In M. Kurosu (ed.). Human-Computer Interaction: User Interface Design, Development and Multimodality (HCI 2017,Part I, LNCS 10271)[C]. Cham: Springer. 427-446.
[6]	Hjelmslev, L. 1970. Language: An Introduction[M]. Madison: The University of Wisconsin Press.
[7]	Hua, Q., L. Ye. D. F . et al. 2025. Context engineering 2.0:The context of context engineering[J/OL]. arXiv preprint arXiv:2510. 26493.[2025-12-01]. https://arxiv.org/abs/2510.26493.
[8]	Mauthner, F. 1901. Beiträge zu einer Kritik der Sprache[M]. Stuttgart: J.G. Cotta’sche Buchhandlung.
[9]	Sag, I. A. & T. Wasow. 1999. Syntactic Theory: A Formal Introduction[M]. Stanford: CSLI Publications.
[10]	Sarangi, S. & P. Sharma. 2020. BIG DATA: A Beginner’s Introduction[M]. Abingdon and New York: Routledge.
[11]	Smith, N. A. 2020. Contextual word representations: Putting words into computers[J]. Communications of the ACM (6): 66-74.
[12]	ten Hacken, P. 2001. Revolution in computational linguistics: Towards a genuinely applied science[A].In W. Daelemans, K. Sima’an, J. Veenstra, et al. (eds.). Computational Linguistics in the Netherlands 2000[M]. Amsterdam: Rodopi. 60-72.
[13]	Zhou, Y., J. Jiang & H. Liu. 2025. Language universals in sentence length: Comparing sentence length distributions of 10 languages[J]. Cognitive Science (9): e70115.
[14]	杰弗里·埃佛勒斯·辛顿. 2024. 杰弗里·辛顿接受尤利西斯奖章时发表的获奖感言(陈国华译)[J]. 当代语言学(4):489-495.
[15]	陈浪. 2024. ChatGPT和语言学研究[A]. 杨旭、罗仁地. ChatGPT来了:语言科学如何看待ChatGPT[M]. 上海: 上海教育出版社.37-45.
[16]	德雷仁. 1999. 世界共通语史——三个世纪的探索[M]. 北京: 商务印书馆.
[17]	冯志伟. 2017. 自然语言计算机形式分析的理论与方法[M]. 合肥: 中国科学技术大学出版社.
[18]	费尔迪南·德·索绪尔. 1980. 普通语言学教程[M]. 北京: 商务印书馆.
[19]	刘海涛. 2005. 从比较中看计算语言学[J]. 咸宁学院学报(4): 60-66.
[20]	刘海涛. 2009. 依存语法的理论与实践[M]. 北京: 科学出版社.
[21]	刘海涛. 2017. 计量语言学导论[M]. 北京: 商务印书馆.
[22]	刘海涛. 2023. 语言规划讲义[M]. 北京: 商务印书馆.
[23]	刘海涛. 2024. 从语言数据到语言智能——数智时代对语言研究者的挑战[J]. 中国外语(5): 60-66.
[24]	刘海涛. 2025. 数据驱动语言学的理论基点[J]. 中国社会科学(4): 184-203.
[25]	刘海涛、郑国锋. 2021. 大数据时代语言学理论研究的路径与意义[J]. 当代外语研究 (2): 5-18,31.
[26]	陆前、刘海涛. 2025. 人类真实语言为什么不会无限中心递归[J]. 当代语言学(3): 361-379.
[27]	马尔. 1988. 视觉计算理论[M]. 北京: 科学出版社.
[28]	斯蒂芬·沃尔弗拉姆. 2023. 这就是ChatGPT[M]. 北京: 人民邮电出版社.
[29]	尤瓦尔·赫拉利. 2014. 人类简史:从动物到上帝[M]. 北京: 中信出版社.
[30]	尤瓦尔·赫拉利. 2024. 智人之上:从石器时代到 AI时代的信息网络简史[M]. 北京: 中信出版社.
[31]	张奇、桂韬、郑锐、黄萱菁. 2024. 大规模语言模型:从理论到实践[M]. 北京: 电子工业出版社.
[32]	张子豪、刘海涛. 2023. 从线性位置看神经网络模型中语言规律的获得与表征[J]. 当代语言学(6): 791-809.

	符号语言观	概率语言观
基本单位	离散符号	连续向量
规则性质	确定性规则	概率性关联
语言知识	显式规则系统	隐式概率分布
语言习得	系统参数设置	概率统计学习
语义表示	逻辑形式	向量空间
变异处理	规则例外	概率梯度
普遍性基础	先天语法习得机制	数据规律和模式