Towards a Part-of-Speech (PoS) Gram Approach to Corpus-based Phraseology

  • LIN Ling ,
  • LIU Ming
Expand

Online published: 2025-08-26

Abstract

One of the primary concerns in corpus linguistics is phraseology. This study first gives an extensive review of previous studies on phraseology in corpus linguistics, and then proposes a new approach to corpus-based phraseology studies, i.e., the Part-of-Speech (PoS) gram procedure. After giving a brief introduction to its definition and key features, this study also addresses the implications of this approach in four different fields of study, including the extraction and analysis of phraseologies, the function of phraseologies, textual studies, and language teaching and learning. It is argued that although this approach has long been ignored in international and domestic linguistic studies, it can provide phraseology studies with new perspectives, new tools, and new findings, which can be a useful supplement to existing approaches and worthy of further exploration and application.

Cite this article

LIN Ling , LIU Ming . Towards a Part-of-Speech (PoS) Gram Approach to Corpus-based Phraseology[J]. Contemporary Foreign Languages Studies, 2025 , 25(4) : 15 -27 . DOI: 10.3969/j.issn.1674-8921.2025.04.002

References

[1] Altenberg, B. 1998. On the phraseology of spoken English: The evidence of recurrent word-combinations [A]. In A. P. Cowie (ed.). Phraseology: Theory, Analysis and Applications[C]. Oxford: Oxford University Press. 101-122.
[2] Lin, L. & M. Liu. 2021. Towards a part-of-speech (PoS) gram approach to academic writing: A case study of research introductions in different disciplines[J]. Lingua (4): 1-18.
[3] Biber, D. & S. Conrad. 2009. Register, Genre and Style[M]. Cambridge: Cambridge University Press.
[4] Biber, D., S. Johansson, G. Leech, et al. 1999. Longman Grammar of Spoke n and Written English (Vol. 2)[M]. London: Longman.
[5] Breeze, R. 2019. Part-of-speech patterns in legal genres [A]. In T. Fanego & P. Rodríguez-Puente (eds.). Corpus-based Research on Variation in English Legal Discourse (Vol. 91)[C]. Amsterdam: John Benjamins. 79-103.
[6] Brett, D. & A. Pinna. 2015. Patterns, fixedness and variability: Using PoS-grams to find phraseologies in the language of travel journalism[J]. Procedia-Social and Behavioral Sciences (3): 52-57.
[7] Butler, C. S. 2003. Multi-word sequences and their relevance for recent models of Functional Grammar[J]. Functions of Language (2): 179-208.
[8] Carter, R. & M. McCarthy. 2006. Cambridge Grammar of English[M]. Cambridge: Cambridge University Press.
[9] Cheng, W. 2006. Describing the extended meanings of lexical cohesion in a corpus of SARS spoken discourse[J]. International Journal of Corpus Linguistics (3): 325-344.
[10] Cheng, W., C. Greaves & M. Warren. 2006. From n-gram to skipgram to concgram[J]. International Journal of Corpus Linguistics (4): 411-433.
[11] Cheng, W., C. Greaves, J. Sinclair, et al. 2009. Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams[J]. Applied Linguistics (2): 236-252.
[12] Cortes, V. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology[J]. English for Specific Purposes (4): 397-423.
[13] Greaves, C. & M. Warren. 2010. What can a corpus tell us about multi-word units? [A]. In O. K. Anne & M. Michael (eds.). The Routledge Handbook of Corpus Linguistics[C]. New York: Routledge. 240-254.
[14] Hunston, S. & G. Francis. 2000. Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English[M]. Amsterdam and Philadelphia: John Benjamins.
[15] Hyland, K. 2008. As can be seen: Lexical bundles and disciplinary variation[J]. English for Specific Purposes (1): 4-21.
[16] Lim, J. D., O. Mark, G. Pérez-Paredes, et al. 2024. Exploring part of speech (pos) tag sequences in a large-scale learner corpus of L2 English: A developmental perspective[J]. Corpora (1): 31-59.
[17] Morley, B. & P. Shift. 2006. Towards the automatic identification of directive speech acts [A]. In R. Facchinetti & M. Rissanen (eds.). Corpus-based Studies of Diachronic English[C]. Bern: Peter Lang. 95-112.
[18] Pinna, A. & D. Brett. 2018. Constance and variability: Using PoS-grams to find phraseologies in the language of newspapers [A]. In J. Kopaczyk & J. Tyrkk? (eds.). Applications of Pattern-driven Methods in Corpus Linguistics[C]. Amsterdam: John Benjamins. 107-130.
[19] Renouf, A. & J. Sinclair. 1991. Collocational frameworks in English [A]. In K. Aijmer & B. Altenberg (eds.). English Corpus Linguistics[C]. Harlow: Longman. 128-143.
[20] Scott, M. 1997. Wordsmith Tools Manual[M]. Oxford: Oxford University Press.
[21] Sinclair, J. 1991. Corpus, Concordance, Collocation[M]. Oxford: Oxford University Press.
[22] Sinclair, J. 1996. The search for units of meaning[J]. Textus (1): 75-106.
[23] Sinclair, J. 1998. The lexical item [A]. In E. Weigand (ed.). Contrastive Lexical Semantics[C]. Amsterdam: John Benjamins. 1-24.
[24] Sinclair, J. 2004. Trust the Text: Language, Corpus and Discourse[M]. London and New York: Routledge.
[25] Sinclair, J., S. Jones & R. Daley. 1970. English lexical studies: Report to the Office of Scientific and Technical Information (OSTI)[R]. Birmingham: Department of English, University of Birmingham.
[26] Stefanowitsch, A., K. Middeke & F. Lin. 2023. Nominal constructions in spoken academic Englishes: A quantitative corpus-based approach[J]. Yearbook of the German Cognitive Linguistics Association (1): 75-104.
[27] Stubbs, M. 2007a. An example of frequent English phraseology: Distributions, structures and functions [A]. In R. Facchinetti (ed.). Corpus Linguistics 25 Years on[C]. Amsterdam: Brill Rodopi. 87-105.
[28] Stubbs, M. 2007b. Quantitative data on multi-word sequences in English: The case of the word world [A]. In M. Hoey, M. Mahlberg, M. Stubbs, et al. (eds.). Text, Discourse and Corpora: Theory and Analysis[C]. London: Continuum. 163-189.
[29] Stubbs, M. 2009. The search for units of meaning: Sinclair on empirical semantics[J]. Applied Linguistics (1): 115-137.
[30] Thompson, P. & A. Sealey. 2007. Through children’s eyes?: Corpus evidence of the features of children’s literature[J]. International Journal of Corpus Linguistics (1): 1-23.
[31] Warren, M. 2009. Why concgram? [A]. In C. Greaves (ed.). ConcGram 1.0: A Phraseological Search Engine[C]. Amsterdam: John Benjamins. 1-11.
[32] Wilks, Y. 2005. REVEAL: The notion of anomalous texts in a very large corpus[R]. Tuscany: Tuscan Word Centre International Workshop, Certosa di Pontignano.
[33] 何安平. 2013. 国外语料库语言学视角下多形态短语研究述评[J]. 当代语言学(1): 62-72.
[34] 雷蕾、 刘迪麟、 晏胜. 2017. 基于窗口与基于句法分析的搭配提取:问题与方法[J]. 语料库与跨文化研究(1): 13-36.
[35] 李文中. 2021. 接着做:扩展意义单位分析[J]. 当代外语研究(6): 13-26, 88.
[36] 刘永芳、 陈宗利. 2019. 中外硕士学位论文英文标题的名词化特征实证性研究[J]. 外语教学(5):18-23.
[37] 卢伟胜、 郭躬德、 陈黎飞. 2014. 基于词性标注序列特征提取的微博情感分类[J]. 计算机应用(10):2869-2873.
[38] 王立非、 文道荣. 2017. 商务英语合同的语义范畴与词性分布特征的语料库考察[J]. 山东外语教学(2):12-20.
[39] 吴君、 赫里·蒂萨里. 2021. 中国英语学习者和英语母语者使用程度副词和动词的比较[J]. Chinese Journal of Applied Linguistics(4): 470-487.
[40] 卫乃兴. 2012. 共选理论与语料库驱动的短语单位研究[J]. 解放军外国语学院学报(1): 1-6.
[41] 卫乃兴. 2009. 语料库语言学的方法论及相关理念[J]. 外语研究(5):36-42.
[42] 许家金. 2017. 体裁短语学视角下的医学学术英语词典研编[J]. 外语与外语教学(6):52-60.
[43] 甄凤超. 2020. 语料库语言学研究热点追踪与思考[J]. 当代外语研究(6):89-100.
[44] 甄凤超. 2023. 复合词项语义韵研究再探[J]. 外语教学与研究(1):41-52.
Outlines

/