Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite (2 and 3) are supported.
Recoll is a personal full text desktop search tool based on Xapian. It provides an easy to use, feature-rich, easy administration interface with a Qt-based GUI. Text, HTML, PDF, PostScript, MS Word, OpenOffice, Wordperfect, KWord, Abiword, maildir, and mailbox mail folder formats are supported, along with their compressed versions and quite a few others. Powerful query facilities are provided. Multiple character sets are supported, and internal processing and storage uses Unicode UTF-8. Stemming is performed at query time and the stemming language can be switched after indexing.
Java Search Engine is a server-side search engine program for Web sites written completely in Java. It features HTML and PDF indexing, a built-in Web crawler, international encodings support, words and phrases search, and returning results as quotations with highlighted words (like Google). It is available as EJB, JSP, servlet, or Java API library. For non-Java enviroments, it is available as an XML server with XSLT support.
Amberfish is a general purpose text/XML retrieval utility. It features indexing of both free text and nested fields, built-in support for XML documents, structured queries allowing generalized field/tag paths, hierarchical result sets, automatic searching across multiple databases, efficient indexing, and relatively low memory requirements.
Sikher is a desktop program designed to archive, search, and display the Sikh scriptures using advanced functions. It allows the common person to understand and read the messages contained in the Sikh scriptures through translations and transliterations in different languages, thereby breaking the language and geographical barrier between Gurbani (Sikh Scriptures) and the world. Sikher is a robust, future proof, and cross-platform application which may be used by developers to create similar internationalized and localized search applications.
Foxtrot is a full text indexing software for PDF, OpenOffice.org 1 and 2, MS Word, and XLS files. The packge provides two different frontends: a Google-like searching tool implemented with Perl-Gtk and a PHP-based Web interface. The backend scans directories asynchronously, converts files to text, and indexes them in a MySQL database.
VectorSpace Database (VSDB) provides multi-dimensional similarity search capability in a robust server package. It is a server that allows any socket-capable programming language to post and search vectorspaces of multi-dimensional data. Data can be of any base datatype (e.g. text, objects, dating profiles, sessions, ecommerce orders, etc.). VSDB also offers a clustering capability that can display groupings of data based on common dimensions. A built-in thesaurus feature can help bridge multiple-similar-dimensions in search or clustering.
WAscii is a Web frontend intended to display an AsciiDoc documentation repository. It allows you to search and browse your documentation files and automatically converts AsciiDoc to HTML, PDF, and ODF documents. It is intended to work directly from a subversion repository containing your AsciiDoc files.