9 projects tagged "hadoop"

Download No website Updated 15 May 2014 Hypertable

Screenshot
Pop 340.15
Vit 43.57

Hypertable is a high performance, scalable database modeled after Google's Bigtable. It is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures.

No download No website Updated 14 Jul 2011 Beanstalker

Screenshot
Pop 21.12
Vit 32.41

Beanstalker is a set of Maven Plugins for Amazon Web Services (AWS) Elastic Beanstalk and Elastic MapReduce. Plugin Mojos are suitable not only for command-line usage, but for Continuous Integration as well.

Download Website Updated 04 Jun 2012 MapReduce-BitDew

Screenshot
Pop 60.97
Vit 26.90

MapReduce-BitDew is an implementation of the MapReduce programming model proposed by Google for Internet Desktop Grids. Using MapReduce-BitDew, you can execute MapReduce applications on resources like Desktop PCs distributed on the Internet. MapReduce-BitDew features a firewall-friendly protocol, fault-tolerance, result-certification, 2-level schedulers, and more.

Download No website Updated 14 Apr 2014 Infovore

Screenshot
Pop 387.92
Vit 12.95

Infovore is a map/reduce framework for processing large RDF data sets such as Freebase and DBpedia. It is based on Hadoop.

No download Website Updated 14 May 2013 Gfarm

Screenshot
Pop 92.56
Vit 5.11

Gfarm is a distributed filesystem, generally used for large scale cluster computing. It's implemented in userland, and can be mounted by FUSE. It utilizes locality of a file to access a data node, and supports Globus GSI for Wide Area Network. Users can explicitly control file replica location on Gfarm. Gfarm can be used as an alternative storage system to HDFS for Hadoop, Samba, MPI-IO, and GridFTP. Monitoring via ZABBIX and Ganglia is also supported.

No download Website Updated 09 Apr 2010 Hadoop Studio

Screenshot
Pop 149.92
Vit 2.66

Hadoop Studio is a map-reduce development environment (IDE) based on Netbeans. It makes it easy to create, understand, and debug map-reduce applications based on Hadoop, without requiring development-time access to a map-reduce cluster. The studio provides a real-time workflow view of a map-reduce job, which displays the individual inputs, outputs, and interactions between the phases of a map-reduce job. The workflow view of a job updates in real time with the developer's code changes. It then generates Java sources and compiles them into a binary jar file, which can be run on a normal Hadoop cluster.

No download No website Updated 18 Feb 2014 Telepath

Screenshot
Pop 83.23
Vit 2.37

Telepath provides map/reduce code for processing Wikipedia Pagecounts. These contain usage data for all Wikipedia pages in all languages on an hourly basis. Derived from the bakemono toolkit, this project can process this 3TB data set with ease.

No download Website Updated 25 Oct 2012 dispy

Screenshot
Pop 65.83
Vit 2.18

dispy is a Python framework for parallel execution of computations by distributing them across multiple processors in a single machine (SMP), or among many machines in a cluster or grid. The computations can be standalone programs or Python functions. dispy is well suited for the data parallel (SIMD) paradigm where a computation is evaluated with different (large) datasets independently (similar to Hadoop, MapReduce, Parallel Python). dispy features include automatic distribution of dependencies (files, Python functions, classes, modules), client-side and server-side fault recovery, scheduling of computations to specific nodes, encryption for security, sharing of computation resources if desired, and more.

Download No website Updated 18 Jun 2012 Syoncloud Logs

Screenshot
Pop 69.25
Vit 1.46

Syoncloud Logs processes log files from various applications and many servers. It can capture business relevant information from everyday log files generated by Web servers, business applications, and back office applications. It uses Flume sinks that run on the machines that produce log files. This data is filtered and relevant events channeled to HBase. The HBase NoSQL database is used for actual data analysis. The number of HBase nodes depends on the amount of processed log files. Syoncloud Logs has an easy to use installer that includes all necessary components such as Hadoop, Flume, Hbase, and Zookeeper.

Screenshot

Project Spotlight

RealOpInsight

An advanced monitoring dashboard management engine for Nagios, Zabbix, and other open source monitoring software.

Screenshot

Project Spotlight

vifm

An ncurses file manager.