Wisecracker is a high performance distributed cryptanalysis framework that leverages GPUs and multiple CPUs. It allows security researchers to write their own cryptanalysis tools that can distribute brute-force cryptanalysis work across multiple systems with multiple multi-core processors and GPUs. Security researchers can also use the sample tools provided out-of-the-box. The differentiating aspect of Wisecracker is that it uses OpenCL and MPI together to distribute the work across multiple systems, each having multiple CPUs and/or GPUs.
PyParticles is a particle simulation toolbox entirely written in Python. It simulates a particle-by-particle model with the most popular integrations methods, including Euler, Runge Kutta, and Midpoint. It represents the results on an OpenGL or Matplotlib plot, and offers an easy-to-use API.
GPUMarkerTracker is a tracking software library for AR (augmented reality) markers. It utilizes GPGPU for fast and accurate tracking. It is intended for detecting markers from an HD resolution image so that small markers placed far from the camera can be detected. Each marker has a 12-bit payload with 9-bit CRC. The library does not produce false-positive detection errors of markers, practically.
Portable Computing Language (pocl) aims to become an efficient implementation of the OpenCL standard. In addition to producing an easily-portable Open Source implementation, another major goal of the project is improving performance portability of OpenCL programs with compiler optimizations, reducing the need for target-dependent manual optimizations. At the core of pocl is a set of LLVM passes used to statically parallelize multiple work items with the kernel compiler, even in the presence of work group barriers. This enables parallelization of the fine-grained static concurrency in the work groups in multiple ways (SIMD, VLIW, superscalar, etc.). The code base is modularized to allow easy adding of new "device drivers" in the host-device layer. A generic multithreaded "target driver" is included. It allows running OpenCL applications on a host which supports the pthread library with multithreading at the work group granularity.
CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently only two operations are supported: radix sorting and exclusive scan. Radix sort supports all the unsigned integral types as keys, and all the built-in scalar and vector types suitable for storage in buffers as values. Scan supports all the integral types. It also supports vector types, which allows limited multi-scan capabilities.
Moscrack is a WPA cracker for use on clusters. It supports MOSIX, SSH, and RSH connectivity and works by reading a word list from STDIN or a file, breaking it into chunks, and passing those chunks off to separate processes that run in parallel. The parallel processes are then executed on different nodes in your cluster. All results are checked and recorded on your master node. Logging and error handling are taken care of. It is capable of running reliably for long periods of time, without the risk of losing data or having to restart. Moscrack uses aircrack-ng by default. Pyrit for WPA cracking and Dehasher for Unix password hashes are supported via plugins.
Libfairydust is a small wrapper library intended for use with GPU clusters that 'hijacks' CUDA and OpenCL calls. It can be used to 're-route' calls to a certain GPU, so a process requesting GPU#0 might end up running on GPU#4 without knowing (or caring) about it. This works completely transparently and does not need any sort of 'cooperation' from the application, changes to code, or relinking.
FastFlow is a pattern-based programming framework targeting streaming applications. It implements pipeline, farm, divide and conquer, and their composition, as well as generic streaming networks. It is specifically designed to support the development and the seamless porting of existing applications on multi-core, GPGPUs, and clusters of them. The layered template-based C++ design ensures flexibility and extendibility. Its lock-free/fence-free run-time support minimizes cache invalidation traffic and enforces the development of high-performance (high-throughput, low-latency) scalable applications. It has been proven comparable or faster than TBB, OpenMP, and Cilk on several micro-benchmarcks and real-world applications, especially when dealing with fine-grained parallelism and high-throughput applications.