The Okapi project’s main purpose is to architect a set of building blocks for the creation of larger open source localization and translation tools. But many Okapi components are generic enough to be of interest to the text mining, natural language processing, and text retrieval communities. Okapi’s many text filters (HTML, Properties, XML (ITS XPath-based rules), OpenXML, ODF, Regex etc.) provide a straightforward way to access the text of multiple document formats. Its document events and pipeline can be made to integrate with other frameworks such as UIMA, LingPipe, OpenPipeline, OpenNLP, GATE, and Lucene. The advantage of Okapi’s text filters is that not only is text extracted, but all non-textual formatting is preserved. It is possible to decompose a document into events, process them via the pipeline, and then rebuild the input document without loss. Structural information can be added to Okapi document events so that tables, lists, links, titles etc. are grouped together and treated as a unit. This is useful when context based on a “universal” document structure is needed. The Okapi event model supports user configurable annotations, similar to UIMA, but simpler and more restricted in scope. User can annotate spans of text or add new resources such as translation memory matches, terminology, token types, or part of speech information.
TongueTied is a Web based application that helps with the creation of keywords with support for multi-language or multi-region resources. One of the key features of TongueTied is that it allows static resources to be exported from the application and can import translations from resources into the application. The following formats are currently supported for both export and import: Java Properties, .NET Resources (.resx), CSV, and Excel. TongueTied integrates an optional work flow around a keyword to track changes to a translation and ensure the validity of that translation. Operators are allowed to query a translation if they believe it to be incorrect.
HogTrans provides an automatic word translation engine built on statistics of text translations used for free software. It basically provides an automatically created dictionary with multiple translations and example usages for each. HogTrans can import translations from standard GNU .mo-files.
Flexible Localization is a .NET/Mono library for string-based user interface localization. It offers a hierarchical structure (which can be partially loaded) to organize the strings, as well as support for several independently-localized modules. Localization files are validated against a localization declaration which defines the actual strings that are expected to be in the localization files. The strings themselves can be parametrized and evaluated based on expressions (i.e., different strings can be returned by a localization, based on parameter values).