KaHIP - Karlsruhe High Quality Partitioning - is a family of graph partitioning programs that tackle the balanced graph partitioning problem. It focuses on solution quality and implements flow-based methods, more-localized local searches, and several parallel and sequential meta-heuristics.

wgms3d is a full-vectorial electromagnetic waveguide mode solver. It computes the modes of dielectric waveguides at a specified wavelength using a second-order finite-difference method. The waveguide cross section may consist of several adjacent regions of constant refractive index (i.e., step-index profiles). Dielectric interfaces do not have to be aligned with the discretization grid; they may be arbitrarily slanted or curved. The entire waveguide may be curved along the propagation direction. Leakage and curvature losses can be computed using Perfectly Matched Layers as absorbing boundaries.

Harry is a small tool for comparing strings and measuring their similarity. It implements several common distance and kernel functions for strings, as well as some exotic similarity measures. For example, Harry supports the Levenshtein (edit) distance, the Jaro-Winkler distance, and the compression distance. Harry is implemented using OpenMP, so its runtime scales linearly with the number of available CPU cores. Efficient implementations and effective caching speed comparison of strings.

PEDSIM is a microscopic pedestrian crowd simulation system. The PEDSIM library allows you to use pedestrian dynamics in your own software. Based on pure C++/STL without additional packages, it runs on virtually every operating system. The PEDSIM Demo Application (Qt) gives you a quick overview of the capabilities, and is a starting point for your own experiments. PEDSIM is suitable for use in crowd simulations (e.g. indoor evacuation simulation, large scale outdoor simulations), where one is interested in output like pedestrian density or evacuation time. The quality of the individual agent's trajectory is high, PEDSIM can be used for creating massive pedestrian crowds in movies. Since libpedsim is easy to use and extend, it is a good starting point for science projects.

The Graphical Models Toolkit (GMTK) is a toolkit for rapidly prototyping statistical models using dynamic graphical models (DGMs) and dynamic Bayesian networks (DBNs). It can be used for speech and language processing, bioinformatics, activity recognition, and any time series application. It features exact and approximate inference, many built-in factors including dense, sparse, and deterministic conditional probability tables, native support for ARPA backoff-based factors and factored language models, parameter sharing, gamma and beta distributions, dense and sparse Gaussian factors, heterogeneous mixtures, deep neural network factors, and time-inhomogeneous trellis factors, arbitrary order embedded Markov chains, a GUI graph viewer, and much more.

The ExaScale IO (ESIO) library provides simple, high throughput input and output of structured data sets using parallel HDF5. It is designed to support reading and writing of turbulence simulation restart files, but it may be useful in other contexts. The library is written in C99 and may be used by C89 or C++ applications. A Fortran API built atop the F2003 standard ISO_C_BINDING is also available.

Sally is a tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows techniques of machine learning and data mining to be applied for the analysis of string data. It can be used with data such as text documents, DNA sequences, or log files. The vector space model or bag-of-words model is used. Strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. Occurrences of the features in each string are counted. Alternatively, binary or TF-IDF values can be computed. Vectors can be output in plain text, LibSVM, or Matlab format.

Salad (short for Letter Salad) is an efficient and flexible implementation of the well-known anomaly detection method Anagram by Wang et al. (RAID 2006). Salad is based on n-gram models, that is, data is represented as all of its substrings of length n. During training these n-grams are stored in a Bloom filter. This enables the detector to represent a large number of n-grams in little memory and still being able to efficiently access the data. Salad extends Anagram by allowing various n-gram types, a 2-class version of the detector for classification, and various model analysis modes.

Thinknowlogy is grammar-based software, designed to utilize the Natural Laws of Intelligence in grammar, in order to create intelligence through natural language in software. This is demonstrated by programming in natural language, reasoning in natural language and drawing conclusions (more detailed than scientific solutions), making assumptions (with self-adjusting level of uncertainty), asking questions (about gaps in the knowledge), and detecting conflicts in the knowledge. It builds semantics autonomously (with no vocabularies or words lists), detecting some cases of semantic ambiguity. It is multi-grammar, proving that Natural Laws of Intelligence are universal.

TooN is a very efficient numerics library for C++. The main focus of the library is efficient and safe handling of large numbers of small vector matrices and providing as much compile time checking as is possible. The library also works with large vectors and matrices and integrates easily with existing code. In addition to elementary vector and matrix operations, the library also providers linear solvers, matrix decompositions, optimization, and wrappers around LAPACK.

A Java component for manipulating PowerPoint presentations.