libstudxml is a streaming XML pull parser and streaming XML serializer implementation for modern, standard C++. It has an API which the authors believe should have already been in Boost or even in the C++ standard library. It is compact, external dependency-free, and reasonably efficient. The XML parser is a conforming, non-validating XML 1.0 implementation which is based on tested and proven code.
The Link Grammar Parser (link-grammar) is a syntactic parser of English, Russian, Arabic, and Persian (and other languages as well), based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labelled links connecting pairs of words. The parser also produces a "constituent" (Penn tree-bank style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.). The RelEx extension provides dependency-parse output.
gradle-sablecc-plugin is a gradle plugin which creates parsers using SableCC. SableCC supports automatic CST-to-AST transformation, emits all the visitor patterns and analysis helpers you will likely ever need, and is LR, not LL(k). Many example grammars are available for modern languages; the author of this plugin has written dozens.
jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.
lihata is a compact textual language which can represent a tree of lists, hashes, and tables. The syntax tries to be minimal and flexible to allow formatting a lihata file to fit the context it represents. The source release contains an event and DoM parser and helper functions for maintaining lihata trees. lihata is a convenient language for both simple and complex configuration files and text representation of data files.