PdfParser is a standalone PHP library that provides various tools for extracting data from PDF files. It loads and parses objects and headers, extracts meta data, and extracts text from ordered pages. It supports compressed PDF, MAC OS Roman charset encoding, hex and octal encoding in text sections, and is compliant with PSR-0 (autoloader) and PSR-1 (code styling). Currently, secured documents are not supported.
|Tags||PHP class libraries lib Library PDF PDF file manipulation file conversion Extract Extract Text|
|Operating Systems||Not Applicable|
Proud to be referenced by Softpedia Linux software database : http://linux.softpedia.com/get/Printing/PdfParser-103281.shtml
Release Notes: This release fixes some bugs in parsing (font, secured files, etc.). The TCPDF dependency needs to be updated.
Release Notes: This release fixed xobject text extraction and added text fallback in case of missing fonts.
Release Notes: The project has changed licensing from the GPLv2 to the GPLv3 to match TCPDF requirements.
Release Notes: This release updates the parser to support content array objects outside the header (a rewrite of the method Page::getText and a hotfix).
Release Notes: This release adds support for specific date formats and spaces escapes.