Skip to content

amitgupta4407/All_About_PDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AllAboutPDF ๐Ÿ“„

AllAboutPDF is a web-based application for working with PDF files. With this app, you can perform a variety of PDF-related tasks, such as finding out mata data, extract image, extract text, extract annotation and more. ๐Ÿ”จ One of the unique features that sets AllAboutPDF apart from other online PDF apps is our ChatPDF feature. This feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology, enabling users to quickly find the information they need and complete tasks more efficiently.

Live Project Link ๐Ÿš€

The live version of the app is hosted on Streamlit Sharing and can be accessed at the following URL:

Features ๐ŸŽ‰

  • Extract text from a PDF file ๐Ÿ’ฌ
  • Extract images from a PDF file ๐Ÿ–ผ๏ธ
  • Extract metadata from a PDF file ๐Ÿ“
  • Encrypt a PDF file with a password ๐Ÿ”’
  • Chat with a PDF file using OpenAI and Langchain ๐Ÿค–
  • Chat with multiple textual file(pdf, txt, doc, excel, csv, sql) (https://allaboutpdf-multiple-filequery-feature.streamlit.app/)

Overview ๐Ÿ“‹

AllAboutPDF is built using the Python programming language ๐Ÿ and the Streamlit framework. The app uses the PyPDF2 library to perform various PDF-related tasks, such as parsing and extracting relavent information from PDFs. The app also uses OpenAI and Langchain APIs to enable the "ChatPDF" feature.

When a user uploads a PDF file to the app, the app performs the requested task (e.g. merging PDFs), and then generates a new PDF file that the user can download.

Installation โš™๏ธ

To install the repository, please clone this repository and install the requirements:

pip install -r requirements.txt

Usage ๐Ÿƒ

  • To use the main application, run the main.py file with the streamlit CLI (after having installed streamlit):
streamlit run app.py
  • To use the test feature application, run the FileQueryHub.py file with the streamlit CLI (after having installed streamlit):
streamlit run FileQueryHub.py

Motivation ๐Ÿ’ก

The motivation behind AllAboutPDF was to create a simple, user-friendly tool for working with PDF files. While there are many PDF-related tools available online, many of them are complex and difficult to use. AllAboutPDF aims to provide an easy-to-use alternative that can be used by anyone, regardless of technical expertise and make process of data extraction a cake work.

Problem Solved โœ…

PDF files are a ubiquitous file format used for sharing documents across platforms and devices. However, working with PDF files can often be a tedious and time-consuming process. AllAboutPDF aims to solve this problem by providing a simple, user-friendly tool for working with PDF files.

Tech Stack ๐Ÿ› ๏ธ

AllAboutPDF is built using the following technologies:

  • Python ๐Ÿ
  • Streamlit ๐ŸŒŸ
  • PyPDF2 ๐Ÿ“‘
  • OpenAI ๐Ÿค–
  • Langchain ๐Ÿ”—

Challenges Faced ๐Ÿค”

๐Ÿ“š Selecting the most suitable libraries for the project, which we accomplished by choosing Python, Streamlit, PyPDF2, and LangChain. ๐ŸŒŸ Developing a unique feature that distinguishes AllAboutPDF from other online PDF apps. Our ChatPDF feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology. ๐Ÿ’ฐ Optimizing the cost of preparing the knowledge base for ChatPDF by selecting the correct size and ratio of the chunk size and overlap size.

Future Plans ๐Ÿ”ฎ

We have several future plans for AllAboutPDF, including:

  • Merge multiple PDF files into a single file ๐Ÿ“‚
  • Split a PDF file into multiple files ๐Ÿ“„
  • Compress a PDF file to reduce its size ๐Ÿ“‰
  • Convert a PDF file to a different file format (e.g. JPEG, PNG, DOCX) ๐Ÿ”„
  • Adding more PDF-related features, such as OCR (Optical Character Recognition) and watermarking
  • Adding support for more file formats (e.g. Word documents, Excel spreadsheets)

If you have any feedback or suggestions for how we can improve AllAboutPDF, please don't hesitate to get in touch!


image

image

image

image

image

image

Links

 Ask_Book_Questions_Workflow_Ext

Releases

No releases published

Packages

No packages published

Languages