Skip to content

Latest commit

 

History

History
86 lines (57 loc) · 3.18 KB

README.rst

File metadata and controls

86 lines (57 loc) · 3.18 KB
Build Status Documentation Status Coverage Status PyPI package Conda Download stats

Website sitemap parser for Python 3.5+.

Features

Installation

pip install ultimate-sitemap-parser

or using Anaconda:

conda install -c conda-forge ultimate-sitemap-parser

Usage

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')
print(tree)

sitemap_tree_for_homepage() will return a tree of AbstractSitemap subclass objects that represent the sitemap hierarchy found on the website; see a reference of AbstractSitemap subclasses.

If you'd like to just list all the pages found in all of the sitemaps within the website, consider using all_pages() method:

# all_pages() returns an Iterator
for page in tree.all_pages():
    print(page)

all_pages() method will return an iterator yielding SitemapPage objects; see a reference of SitemapPage.