NoSQL: Naive Bayes Classification in Ruby using Hadoop and HBase

| | bookmark | email

Naive Bayes Classification in Ruby using Hadoop and HBase

I couldn't find anything that could possibly handle many terabytes of data, though. Most Ruby implementations, like the classifier gem, have only a simplistic implementation […]. I decided to create a better naive bayes implementation (for instance, using a Laplacian smoother) that could also handle up to many terabytes of corpus data. We already have a Hadoop cluster with HBase running, and HBase is perfect for storing data like word counts.

tags:hadoop,hbase

via NoSQL databases