Lucie is a cluster installation and configuration tool. It enables parallel network installation of large numbers of nodes from one single administration server. The Lucie installer performs HDD partitioning and installations of the Linux kernel and required software packages. The Lucie configurator then generates system and software configurations. Lucie is designed to be scalable and efficient, so a complete Linux cluster can be built from scratch in a short amount of time. Moreover, the whole installation process is designed to be fully automated.
BorderFlow implements a general-purpose graph clustering algorithm. It maximizes the inner to outer flow ratio from the border of each cluster to the rest of the graph. The main advantage of the algorithm is that it does not need parametrization to compute results of high accuracy.
ClodHopper is a Java library for high-performance clustering of numerical data. It contains clustering implementations such as K-Means, K-Means++, X-Means, G-Means, Fuzzy C-Means, Jarvis-Patrick, and various forms of hierarchical clustering. ClodHopper's clustering implementations take advantage of the host system's concurrent processing ability to speed clustering. The data structures are also very lean to conserve memory usage. ClodHopper is very extensible. If you are developing a new clustering algorithm, you may save yourself an enormous amount of work by extending a ClodHopper base class.
CloudVPN is a secure decentralized mesh networking tool. It allows applications to use it as a mesh transport layer for packet routing, easily creating mesh ethernet VPN, secured audio/video broadcasting or communication channels, etc. It can create secured networks with special or weird topologies, so it's very easy to create connection schemes with clustered/decentralized servers, topologies with better throughput, ring-like topologies for failover, long-line for passing through many routes, or tree topology for optimizing inter-server bandwidth needs.
jmemcached is a fast network available cache daemon. It is protocol-compatible with memcached, but written in Java and suitable for applications with portability concerns, where Java is the preferred solution, or for using the memcached protocol in embedded applications with alternate storage engines. Existing clients for memcache work unmodified. It can run as a standalone daemon or be embedded inside an existing Java application.
MLPACK is a C++ machine learning library with an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. It contains algorithms such as k-means, Gaussian mixture models, hidden Markov models, density estimation trees, kernel PCA, locality-sensitive hashing, sparse coding, linear regression and least-angle regression.
StarCluster is a utility for creating traditional computing clusters used in research labs or for general distributed computing applications on Amazon's Elastic Compute Cloud (EC2). It uses a simple configuration file provided by the user to request cloud resources from Amazon and to automatically configure them with a queuing system, an NFS shared /home directory, passwordless SSH, OpenMPI, and ~140GB scratch disk space. It consists of a Python library and a simple command line interface to the library. For end-users, the command line interface provides simple intuitive options for getting started with distributed computing on EC2 (i.e. starting/stopping clusters, managing AMIs, etc). For developers, the library wraps the EC2 API to provide a simplified interface for launching/terminating nodes, executing commands on the nodes, copying files to/from the nodes, etc.