webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.
Libpsl is a C library which provides functions to check domains against the Mozilla Public Suffix List. It is useful for cookie domain verification, certificate domain verification, highlighting parts of a domain name, and more. Every Web client handling cookies (e.g. browsers) should use the PSL data to secure privacy.