Authors’ names:
Danil I. Galyuchenko – Moscow State Institute of International Relations of the Ministry of Foreign Affairs of the Russian Federation, Moscow, Russia
Abstract:
Computational linguistics is a field of research that is closely related to automatic processing of natural language texts. It has become highly relevant in recent years due to the development of new technologies such as generative pre-trained transformer (GPT) models. These models are able to deal with long-term dependencies in text, which makes them promising for searching collocations — semantically related word combinations. The aim of the study is to compare the performance of two collocation retrieval methods: Statistical Natural Language Processing (Statistical NLP) and GPT-4 Turbo. For this purpose, a program using the PMI statistical dependency measure was developed, and a comparative analysis with the results of the GPT model was carried out. The research material was Article 5 of the European Convention on Human Rights. In terms of automating language research, evaluation of features and possibilities in applying methods of automated collocation search in texts through statistical processing of natural language texts and the generative pre-trained transformer (GPT) model for collocation search in texts allows for a better understanding of the diverse approaches and helps select the optimal method automated collocation search in texts for specific research projects. The study describes the differences in the approach to analysis and understanding of natural language texts in the case of the GPT model-based method and the statistical natural language text processing method, and compares the results obtained. Both methods have their advantages and disadvantages, and the choice between them depends on specific tasks and available resources. In some cases, combining these methods can lead to better results in natural language text processing.
Section | LANGUAGE AND CULTURE |
DOI: | 10.47388/2072-3490/lunn2024-68-4-24-40 |
Downloads | 33 |
Key words | NLP; GPT; statistical methods; collocations; linguistics; computational linguistics |