HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust, and well-tested package.
|Tags||Internet Web Dynamic Content Software Development Libraries Java Libraries Text Processing Markup HTML/XHTML|
Release Notes: the license has been changed to the CPL. Maven2 is now used as the build environment. Subversion is used for the source repository. A new Web site was created. <<tag> is now correctly parsed as text. A method to render the start of a tag in HTML was added. CssSelectorNodeFilter does not accept [attr|=val].
Release Notes: Support was added for commonly requested composite tags. Several enhancements were made to the filtering functionality. Additions were made to the HTTP connection processing subsystem. Other user-requested features and bugfixes were made.
Release Notes: This is the first candidate for the final 1.6 release. All outstanding bugs have been fixed. A new XorFilter rounds out the logical node filters.
Release Notes: NodeTreeWalker, a utility class to traverse a tree of Node objects using either depth-first or breadth-first tree order, has been added. Several other bugfixes and patches have been incorporated.
Release Notes: Support has been added for commonly requested composite tags, P, H1-H6, and definition list tags (DL, DT, DD). The node interface has been augmented with get first/last child and get previous/next sibling methods to ease traversing the HTML document.