Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 849 Bytes

File metadata and controls

13 lines (11 loc) · 849 Bytes

Expert Systems

This repository contains the Python code and data to reproduce the results presented in the paper: A. Occhipinti*, L. Rogers*, C. Angione, "A pipeline and comparative study of 12 machine learning models for text classification", Expert Systems with Applications, 201 (2022): 117193

How to run

The following steps are required to run the code:

  1. Python 3.6.x is required, a check is specific put into the code before it continues.
  2. Jupyter notebook server is required
  3. Enron spam corpus dataset is used for this paper, included is the tar zip folders containing the spam emails.
    • AV application's will flag some emails as malicious/virus or a scam, this is fine and restore where necessary.
  4. Ensure all pip dependencies are installed as listed in requirements.txt
  5. Run through the steps laid out in the notebook.