Skip to content

System for Training-based Expansion of Tools for Proper Name Mentions Recognition Based on Active Learning

Notifications You must be signed in to change notification settings

jamnicki/bachelor_thesis_project

Repository files navigation

System for Training-based Expansion of Tools for Proper Name Mentions Recognition Based on Active Learning

Abstract

The quality of tools for recognizing proper names in texts depends on the domain and coverage of the training data. Obtaining a model with satisfactory performance requires using a corpus with a large number of samples, which translates into time spent on data annotation by users with skills such as machine learning engineers, data analysts or annotators, linguists. The goal of the work is to build a system to support the creation of proper name recognition tools by annotating them on a progressively larger set of training data based on the active learning method, thereby improving their quality. The project also aims to accelerate the process of creating datasets and building natural language machine learning models, based on the active learning method. In addition to the use of the promising and constantly developing method of active learning, the motivation of the Author of the work is also to reduce the working time of people involved in the process of creating models. The Author was also prompted to take up the topic of the work by the small number of examples of using the active learning method for the task of recognizing occurrences of proper names, as opposed to classification tasks, in the literature. The paper includes the theoretical basis, a review of available solutions and tools, a description of the implementation, static and dynamic analysis, as well as a summary of the results of the work and the future of the developed system. The stated objectives and requirements were met by the developed system.

Key illustrations

TODO: translate to English