Is It Apache Hadoop or Yahoo Hadoop?

| | bookmark | email | 6 comments

Hadoop is probably the most complete and largely used of the 7 MapReduce implementations implementations I have counted. The project was initiated at Yahoo! and some time ago it was contributed to Apache Software Foundation. By looking at the committers page, it looks like 13 out of 22 committers are from Yahoo!, so this being said, I cannot stop wondering what is Yahoo! Hadoop?

First answer

Yahoo! is opening up its investment in Hadoop quality engineering to benefit the larger ecosystem and to increase the pace of innovation around open and collaborative research and development.

I'll let you decide if this is PR BS only or not.

Second Answer

The Yahoo! Distribution of Hadoop has been tested and deployed at Yahoo! on the largest Hadoop clusters in the world.

Now, this makes a lot of sense: Yahoo! Hadoop is a version of Hadoop that has been tested and patched internally.

Still confused?

But my confusion still persists:

  1. why the patches haven't been applied to the source base hosted by Apache?
  2. why is it not a tag on the source base hosted by Apache? Or at most a separate branch?
  3. why Yahoo! has decided to host it completely separately on GitHub.

And I guess there are only two possible answers:

  1. either Yahoo! has no idea how to run an open source project (nb this is hard to believe)
  2. or Yahoo! has decided to fork back Hadoop and take full control over it.
What do you think?