mindstorms: 7 Flavors of MapReduce

I am pretty sure that those reading this post already know what MapReduce is (in case you want to refresh your memories here is the PDF). I'm also pretty sure that you've already heard about the open source implementation of MapReduce contributed by Yahoo to Apache Foundation: Hadoop and you have probably heard also about Amazon Elastic MapReduce.

At least that's pretty much all I knew about MapReduce and its implementations. But I have discovered a few other solutions that offer a mapreduce implementation (disclaimer: I haven't tried these projects and I don't know their current status).

Disco

Description: Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.

Project: http://discoproject.org/

Skynet

Description: Skynet is an open source Ruby implementation of Google’s MapReduce framework, created at Geni. With Skynet, one can easily convert a time-consuming serial task, such as a computationally expensive Rails migration, into a distributed program running on many computers. If you’d like to learn more about MapReduce, see my intro at the bottom of this document.

Project: http://skynet.rubyforge.org/

FileMap

Description: FileMap is a lightweight system for applying Unix-style file processing tools to large amounts of data stored in files. It provides full map-reduce functionality without requiring that you switch your processing to any particular language or runtime environment, install any special software, or have root on your storage and processing nodes.

Project: http://mfisk.github.com/filemap/

GreenPlum

Description: Greenplum Database is a software solution built to support the next generation of data warehousing and large-scale analytics processing. Supporting SQL and MapReduce parallel processing, Greenplum Database offers industry-leading performance at low cost for companies managing terabytes to petabytes of data.

Project: http://www.greenplum.com/

Hadoop

Description: The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing, including Hadoop Core, our flagship sub-project, provides the Hadoop Distributed Filesystem (HDFS) and support for the MapReduce distributed computing framework.

Project: http://hadoop.apache.org/

Amazon Elastic MapReduce

Description: Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Project: http://aws.amazon.com/elasticmapreduce/

There is also a project from Microsoft research that seems to related to mapreduce: Dryad (investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center) and its DryadLINQ module (make large-scale, distributed cluster computing simple, simple enough for ordinary programmers).

Do you know any others? Also, if you have any experience with any of these projects, I'd really appreciate if you can share it with us. Links to posts covering any of the projects are welcome.

2 comments:

Adinel said... 11:29 AM: GreenPlum is PosgreSQL on steroids... I think. Cloudera http://www.cloudera.com/ is gathering some community and tutorials are awesome!
Mahesh Iyer said... 11:46 PM: Alex, make it 8 flavors of MapReduce - Aster Data's SQL-MapReduce is another innovative approach to MapReduce. The SQL-MR approach brings together the benefits of both worlds to create an easy-to-use MapReduce platform for business analysts and developers alike. Aster Data nCluster is the industry's first Massively Parallel Data Application Server, with an IDE for custom MapReduce development and also provides a Hadoop connector between nCluster and HDFS. Aster Data's SQL-MR offers several custom functions out of the box that accelerate MapReduce algorithms for commonly used analytical routines. More at www.asterdata.com

7 Flavors of MapReduce

Disco

Skynet

FileMap

GreenPlum

Hadoop

Amazon Elastic MapReduce

Labels:

Related Posts

2 comments:

Post a Comment

mindstorms

Latest comments

think differently big

Tag Cloud Sphere ▼

Follow Alex on Twitter ▼

Daily Cloud Stream ▼

Show more articles

Tags

Archive