Basho: New Financing, New CEO
This couldn't go unmentioned:
via NoSQL databases
Skip the intro (until you see the MongoDB logo) to get a short intro on how to use MongoDB in a 3-tier Java EE 6 app:
tags:MongoDB
via NoSQL databases
The advantage of Riak over Mongo is that Riak automatically replicates and rebalances. The advantage of MongoDB over Riak is that Mongo supports secondary indexes and a more robust query language. Both Riak and MongoDB support MapReduce via JavaScript, and both use the SpiderMonkey JavaScript engine. However, Riak's MapReduce framework is more powerful than MongoDB's framework because Riak allows you to run MapReduce jobs on a filtered set of keys. By contrast, in Mongo, you have to run MapReduce jobs across an entire database.
via NoSQL databases
When Zawodny [1] , a MySQL performance guru, gets behind MongoDB, it's time for enterprises interested in NoSQL to consider MongoDB.
tags:MongoDB
via NoSQL databases
Curt Monash offers a 4-point evaluation guide to answer the question in the title:
tags:BigData
via NoSQL databases
This weekend I attempted to figure out how HBase writes perform in multi-threaded environments. […] I used three variants to write records into HBase: Use one HTable for every write HTable is created using a singleton instance of HBaseConfiguration HTable is created using new instance of HBaseConfiguration (Why? - I wanted to check how the behavior changes if connections to servers are not shared. Reference: HTable and HConnectionManager documentation) Use HTablePool
tags:HBase
via NoSQL databases
GigaOm breaks the news of the Yahoo! Hadoop engineering spinoff, HortonWorks:
tags:Hadoop
via NoSQL databases
The architecture for offline processing biodiversity based on Sqoop, Hadoop, Oozie, and Hive:
via NoSQL databases
The architecture of a fault-tolerant ad network built on top of HAProxy, Apache with mod_wsgi and Python, Redis, a bit of PostgreSQL and ActiveMQ deployed on AWS:
tags:Redis
via NoSQL databases
Too many solutions for BigData? Charlie Quinn [1] doesn't think so. He actually talks about too many homegrown solutions. That means a highly fragmented market which poses many problems—ranging from acquiring experienced people and up to reusing proven solutions—for the future of companies and organizations betting on BigData:
tags:BigData
via NoSQL databases
Step by step data modeling with Cassandra:
tags:Cassandra
via NoSQL databases
Justin Sheehy (CTO Basho) answers Michael Coté's (RedMonk) questions about Basho, the current state of NoSQL, and polyglot persistence adoptions among developers.
via NoSQL databases
A Resque, the Redis backed background jobs tool, screencast by the Railscasts peeps . The 12 minutes video is available after the break and you can download the source code from the Railscast website .
tags:Redis
via NoSQL databases
I won't say that everyone should do like Oren Eini and migrate his own blog to using a NoSQL engine , but I agree that NoSQL-based blogs are usually a better version of "hello world" apps.
tags:CouchDB
via NoSQL databases
A good basic intro to MongoDB:
tags:MongoDB
via NoSQL databases
Firstly it was the Google File System described in this 2003 paper (PDF) :
via NoSQL databases
Good slidedeck from Chris Westin (10gen engineer)—I particularly liked the slides summarizing some of the limitations in the relational databases:
tags:MongoDB
via NoSQL databases
If you are waiting for a financial services version of the powerful artificial intelligence system that won a game of Jeopardy against two of the highest winning champions of all time — Brad Rutter and Ken Jennings — don't hold your breath … yet. Unfortunately for the Wall Street techno-geeks and quants looking for another tool to add to their algorithmic arsenal, IBM isn't working on a financial services version of Watson at this time, according to Dr. David Ferrucci […]
tags:Hadoop
via NoSQL databases
So there you have the two approaches to handling machine-generated-data. If you have vast archives, EMC, IBM Netezza, and Teradata all have purpose-build appliances that scale into the petabytes. You also could use Hadoop, which promises much lower cost, but you'll have to develop separate processes and applications for that environment. You'll also have to establish or outsource expertise on Hadoop deployment, management, and data processing. For fast-query needs, EMC, IBM Netezza, and Teradata all have fast, standard appliances and faster, high-performance appliances (and companies including Kognitio and Oracle have similar configuration choices). Column-oriented database and appliance vendors including HP Vertica, InfoBright, ParAccel, and Sybase have speed advantages inherent in their database architectures.
tags:BigData
via NoSQL databases
Many of our systems use Amazon's S3 as a backup repository for log data. Our data became too large to process by traditional techniques, so we started using Amazon's Elastic MapReduce (EMR) to do more expensive queries on our data stored in S3. The major advantage of EMR for us was the lack of operational overhead. With a simple API call, we could have a 20 or 40 node cluster running to crunch our data, which we shutdown at the conclusion of the run. We had two systems interacting with EMR. The first consisted of shell scripts to start an EMR cluster, run a pig script, and load the output data from S3 into our data warehousing system. The second was a Java application that launched pig jobs on an EMR cluster via the Java API and consumed the data in S3 produced by EMR.
tags:Hadoop
via NoSQL databases
Salvatore Sanfilippo about why Redis 2.2.11 is a recommended upgrade:
tags:Redis
via NoSQL databases
A problem everyone using a NoSQL databases faces ( nb : actually I think this applies to most storage engines that don't support full text indexing):
tags:CouchDB
via NoSQL databases
Just a Hacker News poll , nothing scientific though. MySQL has twice the votes of PostgreSQL which is in the second place.
via NoSQL databases
Like any developer , I had an idea to use Limits, Skips, and Sorts query mechanism to achieve pagination. But on long run if document is going to have more than few thousand records , then it is not a good practices to use Limits, Skips, and Sorts. As Limits, Skips, and Sorts reduces the performance on large scale documents
tags:MongoDB
via NoSQL databases
Very good post by Emmanuel Bernard on the Hibernate blog explaining why Hibernate OGM was created, how it works, and what the plan is.
via NoSQL databases
Following the Spring Data model , RedHat wants to bring JPA support to NoSQL solutions:
via NoSQL databases
An IDC Report about the impact of cloud computing on the IT market:
via NoSQL databases
"We tried using NoSQL, but we are moving to Relational Databases because they are easier…"
via NoSQL databases
Very interesting customer base numbers for Sybase IQ, Vertica, SAND Technology, Infobright published by Curt Monash —most are in the hundreds, except for Sybase IQ.
via NoSQL databases
Avoiding storing and maintaining a second copy of large volumes of data is always a good thing. And if the analysis doesn't require joining with data from another source, using the original source data can be advantageous. There are always questions about performance impacts on the operational source, and sometimes security implications as well. However, the main question is around the types of query possible against a NoSQL store in general or a document-oriented database in this case. It is generally accepted that normalizing data in a relational database leads to a more query-neutral structure, allowing a wider variety of queries to be handled. On the other hand, as we saw with the emergence of dimensional schemas and now columnar databases, query performance against normalized databases often leaves much to be desired. In the case of Operational BI, however, most experience indicates that the queries are usually relatively simple, and closely related to the primary access paths used operationally for the data concerned. The experience with MongoDB bears this out, at least in the initial analyses users have required.
tags:MongoDB
via NoSQL databases
After seeing the excerpt from Jonathan Harris' talk at Data Scientist Summit I really wanted to post a link to some of the videos. But they are all behind a registration gateway. Just in case you want to watch them—there are indeed some interesting titles— you'll find them here .
tags:BigData
via NoSQL databases
The fine guys from 10gen have granted me access to publish here videos from their Mongo events organized across US and Europe—thanks Meghan .
tags:MongoDB
via NoSQL databases
Oren Eini, creator of RavenDB, is drinking his own champaign and migrated his own blog to an engine using RavenDB: RaccoonBlog .
tags:RavenDB
via NoSQL databases
A step by step intro to VMware CloudFoundry, MongoDB and Node.js.
tags:MongoDB
via NoSQL databases
If you check the quick review of existing graph databases and the NoSQL graph databases matrix you'll notice that most of these came under either an AGPL license or a commercial one.
tags:Sones
via NoSQL databases
Very interesting idea in the latest Infobright release:
via NoSQL databases
Educative post from TellApart explaining how HBase—with the right data modeling and filtering processes—had them covered for the following requirements coming from the need of log analysis:
tags:HBase
via NoSQL databases
Couchbase Single Server is the CouchDB packaging offered by Couchbase. But I think this is the first time this product came out under this name. At least the first Couchbase Server release didn't mention it.
via NoSQL databases
The best technology companies in the world, be they in hardware, software or Web services, didn't get there by brashly saying they were going to take out #1, kill them or stomp on their grave. If you think about it, or do your share of reading up on Web history, you'll find that for the most part, the companies who we associate with leadership, be it through quality, market share or traffic stats, reached their position with a near purity of thought on their ability to deliver something new and differentiated. So when you read about company X targeting company Y or setting up to take them down, you can almost guarantee they either won't make it, or company Y is going to change the game again.
via NoSQL databases
The echo chamber is reacting :
tags:BigData
via NoSQL databases
I analysed all the popular ones but none fitted my requirements. I had one criteria for selecting a database: I must be able to code in Java. Most available systems were non-Java based which would be a significant issue for a one man project. Even if they had Java interface, the installation, setup, etc. were a tedious process. Having a database developed purely in Java has many advantages: Easy packaging with other applications Easy to install and run Can be embedded Can run in same or different VM Easy to debug Easy to test After much searching, I came across OrientDB.
tags:OrientDB
via NoSQL databases
Remember when everyone was suggesting solutions for Twitter architecture? Now the Library of Congress is trying to figure out what technologies to use to store the Twitter archive:
via NoSQL databases
jaql was created and is used by IBM InfoSphere BigInsights—the IBM Apache Hadoop distribution:
via NoSQL databases
I wondered just how much faster read/write operations could be with Redis (if at all) over the H2 database so I set out to write a little test app to see for myself.
tags:Redis
via NoSQL databases
Alon Salant forked and upgraded the Instagram realtime demo sourcecode to allow anyone to play with latest versions of Node.js, Redis, and the Instagram real-time API—3 cool projects du jour.
tags:Redis
via NoSQL databases
How many NoSQL databases can be described in
via NoSQL databases
InformationWeek quoting Steve Balmer:
tags:BigData
via NoSQL databases
One of the clearest usage of BigData I've read about—if it delivers:
tags:BigData
via NoSQL databases
Nice metaphor on how CouchDB can be used to make sense of unstructured data:
tags:CouchDB
via NoSQL databases
What to look for when hiring a data geek—a different name of the now established data scientist role
via NoSQL databases
Infoworld published a non scientific research on the hottest new jobs in IT and Data scientists and Cloud architects made it in the top 6.
via NoSQL databases
IT analytics company Splunk has received a patent for its method of organizing and presenting big data to mirror the experience of browsing links on the web. The patent validates Splunk's unique approach to the problem of analyzing mountains of machine-generated data and hints at a future where writing big data applications doesn't require a Ph.D.
tags:BigData
via NoSQL databases
Lorenzo Alberton with an overview of the NoSQL landscape:
tags:CouchDB,MongoDB,Riak,Redis,Membase,Neo4j,Cassandra,HBase,Hypertable
via NoSQL databases
A guide of using mobile Couchbase Xcode project templates by Marty Schoch. It takes only 5 minutes to get started.
tags:CouchDB
via NoSQL databases
Not much of a designer myself, but mokk.me is built on CouchDB alone using only HTML, CSS, Javascript:
tags:CouchDB
via NoSQL databases
Bloomberg reports on EMC's planned budget for acquisitions in the BigData market:
via NoSQL databases
I only missed the 7th and 9th:
tags:MongoDB
via NoSQL databases
No idea how this would work:
via NoSQL databases
Migrating to a NoSQL database is not a free ride. There are some costs and complexity involved in this process. I’ve found a good list of the costs involved in a slide from Tom Melendez’ (a bit old) presentation (embedded below):
via NoSQL databases
Using Redis as a triple store back-end requires an interesting combination of data types, operations, and multi-commands:
tags:Redis
via NoSQL databases
BigData was defined as the 3 Vs: volume, variety, velocity , but Brian Hopkins (Forrester principal analyst) is adding the forth V: variability :
tags:BigData
via NoSQL databases
Merriman's vision was to build a DB that would scale on commodity hardware and in the cloud. Their customers include names like Intuit, Shutterfly and foursquare. Like other NoSQL DBs, they really excel when the data needs and read/write loads are big. For this reason, they are perfectly suited to much of the social networking and big web based apps and networks that we all use today.
tags:MongoDB
via NoSQL databases
Good guide on how to translate logical operators in Redis set commands:
tags:Redis
via NoSQL databases
The Unisphere/MarkLogic survey also found that 86% of respondents admit that unstructured data is important to their organization, yet only 11% have clear procedures and policies for managing unstructured data in place. In addition, 80% of respondents know the amount of unstructured data will rise in the next three years, but only 24% of respondents believe their current infrastructure will be able to adequately manage it.
tags:BigData
via NoSQL databases
Ted Yu explains the internals of the HBase load balancing with references to corresponding JIRA tickets and the latest improvements:
tags:HBase
via NoSQL databases
I've been exploring MongoDB primarily for its map-reduce functionality lately. I've found a few shortcoming that I'm not crazy about.
tags:MongoDB
via NoSQL databases
I used before the word geeky for the Smalltalk and CouchDB or the Smalltalk client for Riak . But this story is the real geeky thing. You must read it to believe me:
tags:MongoDB
via NoSQL databases
These slides have generated quite a reaction on Twitter. I'll let you decide for yourself the reasons:
tags:MongoDB
via NoSQL databases
John Battelle ranking a series of data rich/intensive companies based on
tags:BigData
via NoSQL databases
Jean-Pierre Dijcks (Oracle):
tags:BigData
via NoSQL databases
The 310 billion triple result that Franz is announcing today was achieved in only two weeks of access (actual loading time of just over 78 hours) to an 8-socket Intel Xeon E7-8870 processor-based server system configured with 2 terabytes of physical memory and 22 terabytes of physical disk. "We're confident that with additional time, another terabyte of memory, and a bit more storage capacity, the previously unreachable goal of 1 trillion triples can be achieved. Even double that is not out of the question," stated Dr. Jans Aasman, CEO of Franz Inc.
tags:AllegroGraph,Hadoop
via NoSQL databases
How fast can you set up a demo cluster:
tags:Riak
via NoSQL databases
RavenDB gets filtered replication— like the one CouchDB had for a while
via NoSQL databases
All features in CouchDB 1.1.0 nicely and cleanly documented.
tags:CouchDB
via NoSQL databases
Chuck Hollis (VP EMC) explores the idea of quantifying financially a corporation BigData:
tags:BigData
via NoSQL databases
A hosted monitoring tool for Redis:
tags:Redis
via NoSQL databases
Jonathan Ellis explains hinted handoff and its implications on consistency, availability, and performance:
tags:Cassandra
via NoSQL databases
Just as I speculated , RethinkDB has finally launched the 1.0 version with Memcached compatibility only. Jason Kincaid (Techcrunch) writes :
via NoSQL databases
Robert Newson just announced a new version of Apache CouchDB, 1.1.0, featuring native SSL, HTTP range requests, and a other features and improvements listed below:
tags:CouchDB
via NoSQL databases
Dhanji R. Prasanna leaving Google:
via NoSQL databases
Martin Schneider (Basho) trying to answer the question in the title:
tags:Riak,Cassandra,HBase,Hypertable,Hadoop
via NoSQL databases
Dhruba Borthakur started a series of posts — part 1 and part 2 — describing both the process that lead Facebook to using HBase and Hadoop, but also the projects where these are used and their requirements:
tags:Hadoop
via NoSQL databases
Rate limiting can be an effective way of conserving resources and preventing automated or nefarious activities on your site. The key issues to address when designing a solution are: How do we incorporate time given that it's a continuous variable? How can we efficiently expire old data? How can we scale the solution so that it can handle many hundreds of subjects and/or actions per second?
tags:Redis
via NoSQL databases
Bradford Stephens [1] talking distributed systems:
via NoSQL databases
MongoHQ, a MongoDB hosting solution :
tags:MongoDB
via NoSQL databases
Cassandra 0.8.0 was announced today and it brings quite a few exciting new features:
tags:Cassandra
via NoSQL databases
Todd Hoff puts together some good references on data deduplication:
tags:BigData
via NoSQL databases
Just in case you ever run out of readings on distributed systems, Dr. Indranil Gupta has put together an extensive list of papers including all the classics, but also some you might not have heard about. Here they are for you to enjoy.
via NoSQL databases
Basho released a minor update for both Riak and Riak Search. Release notes for Riak and Riak Search are available at the following links: Riak 0.14.2 and Riak Search 0.14.2 .
tags:Riak
via NoSQL databases
From an article touting the value of data mining in the era of BigData:
tags:BigData
via NoSQL databases
Alex Handy (SDTimes) names Cascading, Mahout, Hive, Avro, and Storm the top most powerful Hadoop projects.
tags:Hadoop
via NoSQL databases
Analysts David Reine and Mike Kahn researching storage of digital data for a long (decade) time:
via NoSQL databases
The only NoSQL name making the list is Couchbase which is still working on merging their CouchDB and Membase products.
via NoSQL databases
There lies the answer! We have a requirement of recreating the cluster in case we accidentally delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it's stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices.
tags:HBase
via NoSQL databases
About me: Software architect, Web Aficionado, Cloud Computing Fanboy, Geek Entrepreneur, Speaker, Co-founder and CTO of InfoQ.com, Writing also about NoSQL on the myNoSQL blog
