Skip to content

Semi-supervised LDA topic modelling and analysis of AHRC research grant application abstracts

Notifications You must be signed in to change notification settings

kuslitsanna/AHRC_awards

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AHRC_awards

Semi-supervised LDA topic modelling and analysis of AHRC research grant application abstracts

About this Data Project

The AHRC Grants Topic Browser is built upon the findings of topic analysis conducted on research grant applications that have been awarded funding by the Arts and Humanities Research Council (AHRC) between 2013 and 2023.

Topics

The 32 topics identified here have been generated using a combination of unsupervised and semi-supervised machine learning techniques (LDA and seeded LDA) in a heuristic manner. The goal was to arrive at a classification of the documents in the corpus that is both statistically robust and intuitively meaningful to a human observer. I have labeled the emerging topics based on my interpretation of the documents identified by the model as most strongly associated with the given category (Most Relevant Projects), and the cluster of terms idenified as having the highest probability of appearing in the associated documents (Most Frequent Words). The topic labels are of necessity imperfect. When selecting them, my aim was to find broad concepts that best capture the semantic overlap within each category.

Data Source

The data analysed here is sourced from publicly available information provided by the UK Research and Innovation (UKRI) at Gateway to Research (GtR). The analysis focused on research grant applications, excluding studentships, fellowships, and training grants awarded by the AHRC. 2270 applications have been analysed.

Credits

Author: Anna Kuslits

Acknowledgments: The analysis was performed using the quanteda and seededLDA R packages, developed by Kenneth Benoit and Kohei Watanabe at the LSE Data Science Institute. In visualising the results and designing the dashboard, I drew inspiration from Mining the Dispatch, created by Robert K. Nelson and the Digital Scholarship Lab at the University of Richmond.

About

Semi-supervised LDA topic modelling and analysis of AHRC research grant application abstracts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages