ExternalSort is a class that can sort large files similar to the Unix sort command. It can read the file to be sorted in small buckets associated with temporary files to not exceed the configured PHP memory limits. The buckets are sorted individually and then merged to produce the final sorted output. The class provides command line interface options so it can be executed as a command from a shell.
UverseWiki is a modular open source PHP framework designed for text processing. Unlike most existing solutions, it is not regular expression-based but instead uses a recursive descent parser to build a document object model. After the parsing stage has been finished and the DOM is produced, the original source is discarded and all operations are performed on the document tree instead: nodes can be altered, serialized, or rendered into a particular format (such as HTML or RTF). The wiki syntax is language-neutral and the processing itself is carried out in UTF-8.
ogEditor is a Web-based WYSIWYG HTML editor with a built-in file manager. It features a Tag Selector which lets you view and edit a tag's attributes and internal styles while working in the Design view of an HTML page. Tag Selector displays the entire chain of tags which apply to the current selection or to the cursor position. When any of the tags is selected, its corresponding element will be highlighted in the Design view, and the selected element's attributes and internal styles are also displayed and can be edited in the Property editor window.
FuzzyIndex indexes text for performing fuzzy searches using PHP and SQLite. It can process a list of text strings and build a database which indexes snippets of those strings and the locations where they appear. The class can also search for given keywords and returns the locations of the indexed strings where the best-matching text appears. It uses SQLite to store the indexed text database, but the class can be extended to use a different database type. It uses certain heuristics to extract the snippets from the indexed text. These heuristics are implemented as separate classes which can be used interchangeably.
libunibreak is an implementation of the line breaking and word breaking algorithms as described in Unicode Standard Annex 14 and Unicode Standard Annex 29. It is a superset of, and supersedes, liblinebreak. It is designed to be used in a generic text renderer. FBReader is one real-world example.
documentr is a Web-based tool for editing and presenting software documentation. It allows you to easily maintain documentation for multiple products and product branches. Edits can easily be copied between branches, with merge conflicts being handled gracefully. It uses Markdown as its markup language, along with some extensions, and has a role-based permission system.
Template Data Interface (TDI, /ʹtedɪ/) is a markup templating system written in Python with (optional but recommended) speedup code written in C. Unlike most templating systems, TDI does not invent its own language to provide functionality. Instead, you simply mark the nodes you want to manipulate within the template document. The template is parsed, and the marked nodes are presented to your Python code, where they can be modified in any way you want.
Oxygen XML Developer is an Oxygen distribution specially tuned for XML development, providing XML editing, XML conversion, XML Schema development, XSLT/ XQuery/ XPath execution and debugging, SOAP and WSDL testing, Native XML and relational database support, and XML instance generation.
Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.