This Python script provides a command-line interface for performing various operations on PDF files using Xpdf tools. It allows users to extract text, extract images, and convert PDFs to PNG images.
Before using this script, ensure you have the following installed:
- Python 3.6 or above
- Xpdf command-line tools:
- pdftotext
- pdfimages
- pdftops
- pdftopng
You can download Xpdf tools from the official Xpdf website.
- Clone this repository or download the
X-PDF-Script.py
file. - Ensure that the Xpdf tools are in your system's PATH.
To run the script, open a terminal or command prompt, navigate to the directory containing the script, and run:
X-PDF-Script.py
Or Run
X-PDF-Script.bat
The script will present a menu with the following options:
- Extract Text
- Extract Images
- Convert to PostScript
- Convert PDF to PNG
- Exit
Follow the on-screen prompts to select an operation and provide the necessary input.
This option extracts text from a PDF file and saves it as a .txt file.
- Select option 1 from the menu.
- Enter the path to the input PDF file when prompted.
- Enter the path where you want to save the extracted text file.
This option extracts images from a PDF file and saves them in a specified folder.
- Select option 2 from the menu.
- Enter the path to the input PDF file when prompted.
- Enter the path to the folder where you want to save the extracted images.
This option converts a PDF file to PostScript format.
- Select option 3 from the menu.
- Enter the path to the input PDF file when prompted.
- Enter the path where you want to save the PostScript file.
This option converts each page of a PDF file to a separate PNG image.
- Select option 4 from the menu.
- Enter the path to the input PDF file when prompted.
- Enter the path to the folder where you want to save the PNG files.
The script includes error handling for common issues such as:
- Missing input files
- Invalid file extensions
- Missing output directories
If an error occurs, the script will display an error message and return to the main menu.
- The script sanitizes filenames to prevent command injection.
- It validates input files and paths before processing.