Skip to content

Customize your preferences for job searching, and only get the job information fits your need from Indeed

Notifications You must be signed in to change notification settings

catehuang/indeed_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

indeed_scraping

Purpose

  • Get job information you are looking for by filtering in required keywords and/or out unrelated ones
  • Save your valuable time (up to 95%) from scanning all job titles and descriptions from Indeed
  • Easy to use and flexible. Only need to fill in your preferences, run the program, open your browser, and then apply jobs
  • It utilizes Python, Selenium and Pandas to scrape useful data from the website, and uses Flask, JavaScript and Ajax to provide a user-friendly website for users to read matched job information

How to use this application

  1. You will need to download the Chrome Drive corresponding to your Chrome version, install python packages, and create a file named .env to write down your preferences.

    • If your browser needs to be updated, you might have some unexpected errors occurred when you run the program
  2. Create a .env file which should contain the following information

    • Values are case insensitive
  # the path of chrome drive
  CHROME_DRIVER_PATH=
  
  # what position are you looking for
  # this value will feed to the search keyword on Indeed (better to use boolean search - the combination of AND, OR, and NOT; however, it might not work)
  # exmaple: software developer NOT (manager OR lead OR senior)
  WHAT=
  
  # what are the keywords you want to see them on the title? keywords should be separated by a space
  # example: A B C => collect this job information, if the job title contains A or B or C, otherwise ignore it
  # if you are looking for intern or co-op, you can try "intern co-op student"
  RULES_INCLUDED=

  # what are the keywords you don't want them show on the title of the position? keywords should be separated by a space
  # example: A B C => ignore this job information, if the job title contains A or B or C, otherwise collect it
  # if you are looking for entry level job, you definitely don't want to see "manager lead senior sr\. intermediate staff" on the job title
  RULES_EXCLUDED=

  # where the location of the job
  # this value will feed to the search keyword on Indeed
  WHERE=
  
  # the range of the posted date between 1/3/7/14 days
  # this value will feed to the search keyword on Indeed
  WITHIN_DAYS=
  
  # are you looking for remote jobs? 0 or 1
  # this value will feed to the search keyword on Indeed
  IS_REMOTE=
  
  # filter out the job if the content contains the keyword (be general)
  # example: A B C => ignore this job information, if the job description contains A or B or C
  FILTER_OUT_JOB_DESC=
  
  # filter out the job if it mentions/requires n+ years' of experience; n is the value you set in the below
  FILTER_OUT_BY_MIN_REQ_YEARS=
  1. Execute main.py to start the program. And what do you expect for after you start running the program?

    • If the program found the number of jobs on Indeed was zero or greater 1500, it will exit or prompt a message from the console to ask you if you want to continue scraping more than 1500 jobs
    • You might need to add more time for scraping if your computer's performance or network speed is not good enough
      • Symptoms: you can see messages in the terminal telling you the program skipped cases due to exceptions and complaining about no such elements
      • Solution: searching the function time.sleep() in scrap_data.py, and add more minutes
    • Check the messages and make sure the filtering patterns fit your need
  2. After the scraping is done, you will see the message from the terminal

  3. Open the browser and enter "localhost:5000". You will see all the collected jobs on the web page

About

Customize your preferences for job searching, and only get the job information fits your need from Indeed

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published