webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.
|Tags||Java HttpClient spider bot crawler Swing http XML|
|Operating Systems||Windows (32 and 64bit)|
Release Notes: Initial Freecode announcement.