NekoHTML is a simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables application programmers to use the NekoHTML parser with existing XNI tools without modification or rewriting code.
|Tags||Text Processing Markup HTML/XHTML XML|
Release Notes: A charset regression was fixed.
Release Notes: The license was changed to Apache 2.0 and the version number was boosted to reflect the maturity of the project. Project files were reorganized to decouple them from the rest of the CyberNeko Tools for XNI. xercesMinimal.jar and source were updated so that NekoHTML compiles using Xerces-J 2.9.1. The default behavior was changed to not normalize attribute values and a new feature was added to allow users to turn on normalization. The build was modified to target compilation for Java 1.3. Suggested paragraph tag balancing was adjusted and various reported bugs were fixed.
Release Notes: A feature to allow a scanner to fix character entity references for Microsoft Windows characters was added. The nekohtmlXni.jar file is no longer built by default. Tag-balancing was changed to allow headers inside of links. Handling of the blockquote tag, a tag-balancing bug for unknown elements, the mapping of the encoding name in meta tags, various namespace binding bugs, and a no-such-method exception when using the augmentations feature with older versions of Xerces2 were fixed.
Release Notes: This release added features for stripping CDATA delimiters from script and style tags, made augmentations, bugfixes, and performance enhancements, and fixed some tag balancing issues.
Release Notes: This version implements scanning of XML declaration, fixes a script tag scanning bug, and adds version class and manifest entries to query product information.