Note: This project is still underway, with more features coming soon!
Welcome to Lugia, your new favorite companion for diving into the depths of Google Scholar! With Lugia, you can effortlessly extract article information from Google Scholar profiles using just a simple ID and date range. Whether you're a researcher, a student, or just curious about someone's academic contributions, Lugia has got you covered.
- Easy Peasy Scraping: Just provide a Google Scholar ID, and Lugia will fetch article titles, authors, publication dates, and citation counts for you.
- Date Range Filtering: Specify a start year and end year to filter articles within a specific time frame.
- Python 3.7+
selenium
beautifulsoup4
pandas
tabulate
- (optional)
dotenv
for environment variable management
- Clone the repository:
git clone https://github.com/yourusername/lugia.git cd lugia
- Install Pipenv if you haven't already:
pip install pipenv
- Install dependencies using Pipenv:
pipenv install
- Activate the virtual environment:
pipenv shell
Run the scraper with a specific Google Scholar ID and date range:
Usage: lugia.py [OPTIONS]
Lugia: Scraping Author's Publication Data from Google Scholar
Options:
--id TEXT Scrape data based on Google Scholar ID
--file TEXT Scrape data based on a list of Google Scholar ID from
.txt File
--start INTEGER Scrape year start
--end INTEGER Scrape year end
--verbose Will print verbose messages.
--export-dir TEXT Export directory
--count Only count the number of articles
--headless Run the scraper in headless mode
--help Show this message and exit.
Right now, lugia is only capable of printing all the articles in a tabular form from your terminal. Improvements are coming soon!
Contributions are welcome! Please fork the repository and create a pull request with your changes. Ensure that your code follows the existing style and includes tests for any new functionality.
For any questions or suggestions, please open an issue on GitHub or contact me.