Lugia: The Ultimate Google Scholar Article Scraper

Note: This project is still underway, with more features coming soon!

Project Overview

Welcome to Lugia, your new favorite companion for diving into the depths of Google Scholar! With Lugia, you can effortlessly extract article information from Google Scholar profiles using just a simple ID and date range. Whether you're a researcher, a student, or just curious about someone's academic contributions, Lugia has got you covered.

Features

Easy Peasy Scraping: Just provide a Google Scholar ID, and Lugia will fetch article titles, authors, publication dates, and citation counts for you.
Date Range Filtering: Specify a start year and end year to filter articles within a specific time frame.

Requirements

Python 3.7+
selenium
beautifulsoup4
pandas
tabulate
(optional) dotenv for environment variable management

Installation

Clone the repository:

git clone https://github.com/yourusername/lugia.git
cd lugia

Install Pipenv if you haven't already:
```
pip install pipenv
```
Install dependencies using Pipenv:
```
pipenv install
```
Activate the virtual environment:
```
pipenv shell
```

Usage

Run the scraper with a specific Google Scholar ID and date range:

Usage: lugia.py [OPTIONS]

  Lugia: Scraping Author's Publication Data from Google Scholar

Options:
  --id TEXT          Scrape data based on Google Scholar ID
  --file TEXT        Scrape data based on a list of Google Scholar ID from
                     .txt File
  --start INTEGER    Scrape year start
  --end INTEGER      Scrape year end
  --verbose          Will print verbose messages.
  --export-dir TEXT  Export directory
  --count            Only count the number of articles
  --headless         Run the scraper in headless mode
  --help             Show this message and exit.

Output

Right now, lugia is only capable of printing all the articles in a tabular form from your terminal. Improvements are coming soon!

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your changes. Ensure that your code follows the existing style and includes tests for any new functionality.

Contact

For any questions or suggestions, please open an issue on GitHub or contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
backend		backend
.gitignore		.gitignore
Pipfile		Pipfile
README.md		README.md
idea2024-07-19_08-38-53.xlsx		idea2024-07-19_08-38-53.xlsx
lugia.py		lugia.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lugia: The Ultimate Google Scholar Article Scraper

Table of Contents

Project Overview

Features

Requirements

Installation

Usage

Output

Contributing

Contact

About

Releases

Languages

hippocampa/lugia

Folders and files

Latest commit

History

Repository files navigation

Lugia: The Ultimate Google Scholar Article Scraper

Table of Contents

Project Overview

Features

Requirements

Installation

Usage

Output

Contributing

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages