2131 projects tagged "Text Processing"

Download Website Updated 25 May 2014 poppler

Pop 838.47
Vit 100.05

Poppler is a PDF rendering library derived from xpdf. It has been enhanced to utilize modern libraries, and new features have been added. It also provides basic command line utilities.

Download Website Updated 25 May 2014 XMLTV

Pop 589.47
Vit 116.84

XMLTV is a set of programs to obtain and process TV (tvguide) listings and manage your TV viewing. It stores the listings in an XML-based format and most of the programs are filters which read and/or write XML. It includes tools to obtain, sort, grep, print, and munge listings, and two end-user programs to plan a week's TV viewing.

Download Website Updated 24 May 2014 Asymptote

Pop 1,375.08
Vit 150.50

Asymptote is a powerful descriptive 2D and 3D vector graphics language for technical drawing, inspired by MetaPost but with an improved C++-like syntax. It provides for figures the same high-quality level of typesetting that LaTeX does for scientific text. Asymptote is a programming language as opposed to just a graphics program. It can exploit the best features of script (command-driven) and graphical user interface (GUI) methods. High-level graphics commands are implemented in the language itself, allowing them to be easily tailored to specific applications.

Download Website Updated 22 May 2014 GNU Parallel

Pop 698.07
Vit 67.63

GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

Download Website Updated 19 May 2014 KBibTeX

Pop 204.04
Vit 39.29

KBibTeX is a BibTeX editor for KDE to edit bibliographies used with LaTeX. Features include comfortable input masks, starting Web queries (e. g. Google or PubMed), and exporting to PDF, PostScript, RTF, and XML/HTML. As KBibTeX is using KDE's KParts technology, it can be embedded into Kile or Konqueror.

Download Website Updated 17 May 2014 Mavscript

Pop 169.88
Vit 32.96

Mavscript allows the user to do calculations in a text document. Plain text and OpenOffice Writer files (odt) are supported. The calculation is done by the algebra system Yacas or by the Java interpreter BeanShell.

Download Website Updated 16 May 2014 Docx to Text Converter (docx2txt)

Pop 301.89
Vit 24.26

docx2txt is a tool that attempts to generate equivalent text files from (even corrupted) Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII) text experience. It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to fair extent. It depends upon a commandline unzipping program (like unzip, 7z, pkzipc, or wzunzip) that can silently extract single files from zip archives to console/standard output/pipe. It can very conveniently be used to build a Web based docx document conversion service. Some Makefiles and Windows batch files are provided for easy installation of the scripts. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS word processor fails to even open them.

No download Website Updated 16 May 2014 OOoPy

Pop 218.65
Vit 29.54

OOoPy is a Python library for modifying OpenOffice.org documents. It provides a set of transformations on the OOo XML format using the ElementTree XML Library. Transformations included are a mail merge application and the concatenation of documents with formatting intact. The framework supports easy creation of new transformations.

No download Website Updated 13 May 2014 Emdros

Pop 392.99
Vit 91.38

Emdros is a corpus query system for storing and searching linguistically annotated text. It is very generic, supporting almost any kind of annotation from almost any linguistic theory. All linguistic levels of analysis are supported, including phonology, morphology, the lexical level, syntax, and discourse. The core libraries act as a middleware layer between a client and an underlying SQL database. MySQL, PostgreSQL, and SQLite (2 and 3) are supported.

Download No website Updated 06 May 2014 yawl

Pop 112.19
Vit 3.49

This is a comprehensive "word game" word list for UNIX/Linux. It is a superset of the author's ENABLE list, the "OSW", and various lists researched by the author's colleague, Alan Beale. At 264,093 words, it is the largest list of its kind, suitable for use in all manners of crossword-type board games and word construction games, as well as for a spell checker dictionary. The YAWL package now includes two anagramming utilities (supplied as source code, handled by the included Makefile). There is also a shell script that extends the UNIX "strings" system command. This is the word list package recommended for the author's Quackey word game.


Project Spotlight


Accurate solar system data for everyone.


Project Spotlight


A general purpose template engine.