foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes. It comes with an xfst-compatible interface and regular expression language. The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, and boolean operations. More advanced construction methods are also available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.
|Tags||Regex NLP FSM Compilers|
|Operating Systems||Linux (32 and 64 bit) OS X Win Solaris|
|Implementation||C Flex Bison|
Release Notes: Minimum edit distance functionality has been added to the API so that one can search for the closest approximate match in an automaton. There's new faster transducer/automaton apply code, as well as optional indexing of arcs in the separate flookup utility. This release adds rewrite rule formalism "transducers with backreferences" (e.g., T -> || L _ R, where T is a transducer). The flookup-utility now has option to run as a UDP server (-S). More low-level functions have been added to support faster construction of complex automata. There are minor bugfixes in the apply code and some rare memory leak fixes throughout.