Skip to content

Latest commit

 

History

History
182 lines (125 loc) · 8.44 KB

README.md

File metadata and controls

182 lines (125 loc) · 8.44 KB

💡 ReTrack

Status GitHub Issues Last Commit Language Git Forks


This is an automation designed to track the publications of top economic journals using IDEAS RePEc database.

📝 Table of Contents

🧐 About

This program uses the BeautifulSoup and Requests modules to scrape the RePEc website for the top journals and downloads the metadata for their most recent releases. It then stores this data into a .json file that can be used for other automations. The program is designed to be run on a monthly basis to ensure that the data is up to date. Up to the current date, Ideas update its database on the 2nd day of every month.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

🚜 Prerequisites

Start by cloning this repository to your local machine:

git clone https://github.com/joseparreiras/retrack
cd retrack

To run the program, you first need to make sure your system satisfies the module requirements. This can be done using the following command:

pip install -r requirements.txt

The modules that are not pre-installed will be installed automatically.

🎈 Usage

The documentation for the main program can be accessed by running the help command on the terminal:

python get_articles.py -h

Which will generate the following output:

usage: get_articles.py [-h] [--input INPUT] [--list] [--range] [--output OUTPUT] [--n_months N_MONTHS] [--n_volumes N_VOLUMES] rankings [rankings ...]

positional arguments:
  rankings              journal rankings

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT, -i INPUT
                        path to excel input file
  --list, -l            get a list of journals
  --range, -r           get a range of journals
  --output OUTPUT, -o OUTPUT
                        path to output file
  --n_months N_MONTHS, -m N_MONTHS
                        number of months to get
  --n_volumes N_VOLUMES, -v N_VOLUMES
                        number of volumes to get

The file journals.xlsx on the data folder contains the list of the top 500 journals according to the RePEc ranking. This ranking is used to select the journals that will be downloaded. When the program is run, it will automatically get the top 500 journals and store them in the articles.json file on the data folder. The program can be run using the following command on the terminal:

python get_articles.py data/journals.xlsx

✅ Selecting Journals

The selection of journals is made by passing the rankings argument to the command above. There are three options for selecting journals:

  1. Selecting a range of journals by their RePEc rank:

Passing 2 arguments along with the option --range or -r will select the journals from the first to the second argument. For example, running the following command:

python get_articles.py start_rank end_rank -r

Passing 1 argument along with the option --range or -r will select the journals from the first to the end_rank. For example, running the following command:

python get_articles.py end_rank -r
  1. Selecting a list of journals by their Repec rank:

Passing a list of arguments along with the option --list or -l will select the journals with the specified ranks. This list must be separated by spaces and the list keyword (which necessarily comes at last) is used to indicate that the ranks are to be interpreted as a list. For example, running the following command:

python get_articles.py  rank1 rank2 rank3 ... -l

The -list option cannot be used together with the -range option and is taken as the default option if no option is specified. Therefore the above command is equivalent to running:

python get_articles.py rank1 rank2 rank3 ...

❓ Other Arguments

The program also takes the following optional arguments:

  • --input or -i: This argument is used to specify the path to the source excel file. The default value is data/journals.xlsx.
  • --output or -o: This argument is used to specify the path to the output JSON file. The default value is data/articles.json.
  • --n_months or -m: This argument is used to specify the number of months to get. The default value is 1. Setting it to -1 will get all the articles.
  • --n_volumes or -v: This argument is used to specify the number of volumes to get. The default value is 3. Setting it to -1 will get all the volumes.

That can be used in any combination. For example, to get the articles from the last 12 months considering the last 6 volumes of each journal and store them in the "data/foo.json" file, run:

python get_articles.py -o data/foo.json -m 12 -v 6

The default input file is journals.xlsx which contains the top 500 journals according to the RePEc ranking. This file is obtained by running the top_journals.py program. This program can be used to get the top N journals. This can be done by running the following command:

python top_journals.py N

🤖 Automation

I used this program to automatically get the latest versions of my desired top journals and add them to my task manager Things. This is done using Things` new Apple Shortcuts feature which I used to create this shortcut. This tutorial is replicable in macOS only. To replicate it, first you need to create an automation to run this program every month. To do this, open the Automator app and create a new service. Then, add a Run Shell Script action and paste the following code:

cd /path_to_repo/retrack
python get_articles.py other_arguments
shortcuts run "ReTrack" -i data/articles.json

Save this into your Automator iCloud folder. Then, open the Calendar app and create a new event and schedule it to repeat as you like. Finally, click Alert > Custom, select Open File, Other and find the Automator file you just created. This will run the program every time the event is triggered.

If you don't use Things, there is a version of this shortcut that exports that into a Markdown file. It can be found here. The markdown version can also be created from the markdown_export.py file. To do this, change the Automator file to:

cd /path_to_repo/retrack
python get_articles.py other_arguments
python markdown_export.py -i data/articles.json -o out/output_file_name.md

For more information on how to use the markdown export, run:

python markdown_export.py -h

⛏️ Built Using

✍️ Authors