The Multivalent PDF Tools is a suite of tools for manipulating PDF documents. It includes tools for compressing, uncompressing (for hand editing), obtaining metadata, splitting and merging, encrypting and decrypting, validating, imposition (aka n-up), making page images, extracting text, and full-text indexing (with Lucene). The compress tool shrinks the PDF 1.5 Reference from 13.5MB to 8MB in PDF 1.5/Acrobat 6 format and down to 5.1MB in a new proposed "Compact" format.
isbnsearch provides a simple method for retrieving information about any book using only an ISBN or EAN barcode. It is intended to provide assistance for online libraries, user groups, or individual users, and is designed in such a way to provide a distributed ISBN database query system. Users can choose to view the summary information (author, title, publisher, date, edition, subject, ISBN) as HTML, XML, or a pre-formatted SQL statement.
Estraier is a full-text search system for personal use. Its principal purpose is to realize a full-text search system for a Web site. It functions similarly to Google, but for a personal Web site or sites in an intranet. It has fast searching, conspicuous results, relational document search, the ability to handle Japanese text, and support for handling a large number of documents. Installation is easy.
DataparkSearch is a Web search engine tool. It features support for http, https, ftp, nntp, and news URLs, htdb virtual URL support for indexing SQL databases, text/html, text/xml, text/plain, audio/mpeg (MP3), and image/gif mime types built-in support, external parsers support for other document types, the ability to index multilangual sites using content negotiation, searching of all of the word forms using ispell affixes and dictionaries, stopwords and synonyms lists, boolean query language support, results sorting by relevancy, popularity rank, last modified time, and importance (a multiplication of the relevancy and popularity ranks), support for various character sets, and phrases segmenting for the Chinese, Japanese, Korean, and Thai languages. It has accent-insensitive search, mod_dpsearch for Apache, and support for internationalized domain names.
Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.
The Revisionist is a tool for extracting and indexing hidden metadata (such as deleted or modified text) from large collections of MS Word files. It can operate whole Web sites or SMB or NFS directories. It is handy for pen-testing, or it can be used just to spot embarrassing secrets.
POPsearch is a desktop search engine that is designed to help you easily find information on your computer. With features that other search engines don't have,it lets you index your entire collection of email messages and files. As information is indexed, it is immediately available for analysis from any Web browser. When POPsearch is configured correctly, you can also access your data remotely with RSS feeds, email feeds, or from any computer that has a Web browser.