NoSQL: Simhashing in Hadoop with MapReduce, Cascalog and Cascading

Alex Popescu | 2011/05/11 | bookmark | email

Simhashing in Hadoop with MapReduce, Cascalog and Cascading

Simhashing in MapReduce is a quick way to find clusters in a huge amount of data. By using Cascading and Cascalog we're able to work with MapReduce jobs at the level of functions rather than individual map-reduce phases.

tags:Hadoop,MapReduce

via NoSQL databases