Skip to content

Code that allows a Telegram bot to scrape article links from a website and sends it to your chat.

License

Notifications You must be signed in to change notification settings

brihuang99/telegram-article-scraper-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Telegram Article Scraper Bot

Code that allows a Telegram bot to scrape article links from a website using BeautifulSoup4 and sends it to your chat.

Features:
  • 🎨 Customizable - You can pretty much try any publication site (some do block web scraping).
  • ⚡ Quick & efficient - The scraper pulls articles very quickly and sends it directly to your chat.
  • 📰 Article previews - Telegram does a great job displaying a preview of the article link in the chat.
Screenshots:

screenshot1

screenshot2

Instructions

  1. Use a device that can run 24/7, like a Raspberry Pi.
  2. Install Python and Telegram's Python Bot code on said device.
  3. Use Telegram's BotFather to create a new bot. Put the API token in your Python file.
  4. Find the publication site(s) you want to pull articles from and identify the article div class name. I suggest using a browser's inspection feature.
  5. Edit the Python file to include the website, the article div class, and the href into the function to pull the article links.
  6. Run the bot and see if you can scrape the URLs of the articles. (Use the 'test.py' file if you just want to test the web scraper first before entering anything into the Telegram bot code).
  7. 📚 👓 Enjoy!

Examples:

PC Gamer
async def pcgamer(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await context.bot.send_message(chat_id=update.effective_chat.id, text="Let's go!")

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
    Web=requests.get("https://www.pcgamer.com/news/", headers=headers)

    soup=BeautifulSoup(Web.text, "html.parser")
    counter=0

    for link in soup.findAll('a', class_="article-link"):
        await context.bot.send_message(chat_id=update.effective_chat.id, text= link['href']) #send the href to the chat
        counter+=1

        if counter > 10: #Pull only 10 articles
            break
Reuters World News
async def reuters(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await context.bot.send_message(chat_id=update.effective_chat.id, text="Here are the current events")

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
    Web=requests.get("https://www.reuters.com/world/", headers=headers)

    soup=BeautifulSoup(Web.text, "html.parser")
    counter=0

    for link in soup.findAll('a', class_="text__text__1FZLe text__dark-grey__3Ml43 text__inherit-font__1Y8w3 text__inherit-size__1DZJi link__underline_on_hover__2zGL4 media-story-card__heading__eqhp9"):
        site="https://www.reuters.com"
        await context.bot.send_message(chat_id=update.effective_chat.id, text= site+link['href']) #adding the href to the end of www.reuters.com
        counter+=1

        if counter > 10:
            break

Questions?

Feel free to post in the Discussions or Issues tabs.

About

Code that allows a Telegram bot to scrape article links from a website and sends it to your chat.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages