uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.
|Tags||Text Processing Markup General Software Development HTML/XHTML SGML Internationalization Linguistic|
Release Notes: A bug was fixed in uni2ascii in which the substitution count was too high in certain cases. The code was patched to handle lack of getline in NetBSD. The semantics of the pure option were clarified when converting characters in the ASCII range other than space and newline. A bug in which this was not implemented correctly for UTF8 types was fixed.
Release Notes: This release adds U+0085, U+00B7, U+2022, and U+2028 to the characters converted to the nearest ASCII equivalent when this option is invoked.
Release Notes: The Q format (HTML character entities) works again in ascii2uni.
Release Notes: endian.h was renamed to avoid conflict with the external file of the same name.
Release Notes: This release fixes several small bugs, including one that interfered with the use of the Q format (generate character entities if possible) in uni2ascii.