Semantic Classification of Terms by Large Language Models: An Experimental Study in the Subject Area “Digital Law”

Post Views: 12

Author’s name:

Danil I. Galyuchenko -Odintsovo Branch of the Moscow State Institute of International Relations (University) of the Ministry of Foreign Affairs of the Russian Federation, Odintsovo, Russia

Abstract:

Creating terminological databases and maintaining their currency in highly dynamic, semantically unstable, and interdisciplinary domains presents a significant challenge for modern terminologists and lexicographers, as classical methods of collecting, describing, and processing terms do not meet current challenges and needs.
Evaluating the features and possibilities of applying the generative pre-trained transformer model in the context of automating linguistic research will make working with these terms more efficient and manageable.
The aim of the study is to assess the effectiveness of the generative pre-trained transformer (GPT) using the DeepSeek language model as an example in solving the tasks of automatic extraction and semantic classification of terms based on texts in the field of digital law, to identify the prospects of this research direction, and to explore possible avenues for the development of this field.
The study hypothesizes that generative models, due to the specific features of their training, will demonstrate high recall in extracting candidate terms but will encounter difficulties at the stage of semantic classification.
To this end, the following was undertaken: an experimental methodology for extracting and classifying terms using GPT was developed; a corpus of texts and a reference list of terms belonging to the semantic field of “digital law” were compiled; an experiment was conducted, based on the results of which an analysis was performed and the effectiveness of the work was evaluated according to the calculated precision and recall metrics for the purpose of term classification; and directions for further development of this methodology were proposed.

This study confirms the proposed hypothesis and demonstrates the high potential of standard language models for terminological work tasks — potential that can be fully realized through further fine-tuning of the model and its training to address specific tasks.

Section	LANGUAGE AND CULTURE
DOI:	10.47388/2072-3490/lunn2026-73-1-29-44
Downloads	21
Key words	NLP; GPT; linguistics; computational linguistics; lexicography; computer lexicography; terminology; digital law

Download “Проблема семантической классификации терминов большими языковыми моделями: экспериментальное исследование в предметной области” 73-02.pdf – Downloaded 21 times – 1.74 MB