Skip to content

jkerai1/GeneratePasswordListFromWebsite

Repository files navigation

GitHub stars GitHub forks GitHub issues GitHub pulls

GeneratePasswordListFromWebsite (Scrape Websites For Top Keywords)

Scrape website for top keywords then use word association to generate new keywords. Finally normalize the result.

The Output txt file is created in the same directory with as a script with the keywords.

The intention of this project will be to scrape websites for keywords for usage in banned password lists however this could have other uses.

Recall that Banned Password Calculation is complex in Entra - The fuzzy match of a banned password is given 1 point:

image

Also recall that banned passwords are NOT applied retroactively (hashing!)

image

Requirements

pip install scipy==1.12
pip install gensim  
pip install BeautifulSoup

Google's Word Vectors Model

Model must be in path './model/'

Example usage

After configuring the URLs and downloading the model for the word2vec, run the file WebPageScraper.py

As you can see example.com does not reach 30 words but results are still returned:

344489036-477f00b0-7d7f-4c05-879c-3888ead313f4

image

See More

https://github.com/jkerai1/AzurePasswordProtectionCalculator
https://learn.microsoft.com/en-us/entra/identity/authentication/concept-password-ban-bad