The Okapi project’s main purpose is to architect a set of building blocks for the creation of larger open source localization and translation tools. But many Okapi components are generic enough to be of interest to the text mining, natural language processing, and text retrieval communities. Okapi’s many text filters (HTML, Properties, XML (ITS XPath-based rules), OpenXML, ODF, Regex etc.) provide a straightforward way to access the text of multiple document formats. Its document events and pipeline can be made to integrate with other frameworks such as UIMA, LingPipe, OpenPipeline, OpenNLP, GATE, and Lucene. The advantage of Okapi’s text filters is that not only is text extracted, but all non-textual formatting is preserved. It is possible to decompose a document into events, process them via the pipeline, and then rebuild the input document without loss. Structural information can be added to Okapi document events so that tables, lists, links, titles etc. are grouped together and treated as a unit. This is useful when context based on a “universal” document structure is needed. The Okapi event model supports user configurable annotations, similar to UIMA, but simpler and more restricted in scope. User can annotate spans of text or add new resources such as translation memory matches, terminology, token types, or part of speech information.
Expose is a PHP template engine. It supports server and client-sided caching, a plugin system (to simplify common tasks like inserting a date picker) and internationalization (to write templates in multiple languages using external translation files). Unlike most template engines, Expose's template script language is based on PHP itself, which means you don't have to learn a new syntax. You can use most of the PHP language elements and functions in the way with which you're familiar.
Virtaal is a tool for computer-aided translation that offers a simple, user friendly interface. It includes powerful features such as translation memory, terminology management, and placeable handling. Virtaal can edit files such as Gettext PO, XLIFF, and various other localization formats.
BabelKit is an interface to a universal multilingual database code table. It takes all of the programming work out of maintaining multiple database code definition sets in multiple languages. The code administration and translation page lets developers define new virtual code tables, new languages, enter all codes and their descriptions, and then translate them into all languages of interest. Perl and PHP classes retrieve the code descriptions and automatically generate HTML code selection elements in the user's language. This makes internationalization and localization of Web sites and database interfaces much easier.
FDCC (Formal Definitions of Cultural Conventions) or locale files define the conventions used by your language and country to write items such as dates and time, days of the week, months, and numbers. The Vim locale file highlighter highlights ISO TR 14652-style locale input files for easier editing
PrEd is a Java-based graphical utility to find and edit Java property files in JAR, WAR, and other kinds of ZIP archives. It is an appropriate tool for customizing Java applications which use XML and Property files for their configuration. It provides an interface to edit the property values of property files and to edit the values of XML text nodes and attributes. No knowledge of XML is needed to edit XML files. The program can update JAR and WAR files without having to extract them.
OmegaT is a translation memory application intended for professional translators. It does not translate for you (software that does this is called "machine translation"). It features fuzzy matching, match propagation, simultaneous processing of multiple-file projects, simultaneous use of multiple translation memories, and external glossaries. Document file formats include plain text, HTML, and OpenOffice.org/StarOffice. It has Unicode (UTF-8) support (can be used with non-Latin alphabets). It is compatible with other translation memory applications (TMX Level 1).
Xutf8 is a set of locale independent X11 functions. It includes functions to draw UTF-8 text in both the left-to-right and right-to-left directions. It has an input function which converts single-byte and double-byte strings to UTF-8. A function to create a specific fontset is also included.