Solr-Connector-Files crawls and indexes directories and files from your filesystem (whatever is mountable to Linux) into Apache Solr. It features extraction of file contents with Tika, which extracts metadata and text form many document and file formats. It also integrates automatic text recognition (OCR) for images, photos, and PDFs using Tesseract OCR.
GlyphViewer builds translations in a multitude of modern languages from text in your images (and even ancient writing) using advanced OCR technology and online machine translators. This way, you will not only improve the SEO rating of your images, but your online content can be understood by your users in their native language.
Character Recognition is an Android app that allows the user to take a photo (or use existing image files on the device) and then apply the Tesseract OCR engine to extract the text in the photo. It is currently supporting English text, but other language support will be added in the future.
getxbook is a collection of tools to download books from websites. There are tools to download from Google Books' "book preview", Amazon's "look inside the book", and Barnes and Noble's "book viewer". There is an optional GUI written in Tcl/Tk, and some shell scripts using OCR to create plain text or searchable PDFs and DjVu files from the downloaded books.
Aspose.OCR for .NET is a character recognition component built to allow developers to add OCR functionality in their ASP .NET Web applications, Web services, and applications. It provides a simple set of classes for controlling character recognition tasks and supports BMP and TIFF.
MALODOS helps you to scan, store, and easily retrieve all your personal documents. Its storage format is open and documented, so your document archive can remain accessible even without MALODOS. The documents themselves are stored as standard PDF files, while their metadata (such as title, tags, and description) are stored into a separate SQLite database in an open format. With MALODOS, you can also manage existing files in PDF, JPEG, TIFF, and other formats, so you can still use the documents that you've already scanned. You can connect to any external OCR program to give access to a fulltext search feature.