Simhashing in Hadoop with MapReduce, Cascalog and Cascading
Simhashing in MapReduce is a quick way to find clusters in a huge amount of data. By using Cascading and Cascalog we're able to work with MapReduce jobs at the level of functions rather than individual map-reduce phases.
via NoSQL databases
Post a Comment