A Definition of Big Data
By Forresters Boris Evelson and Brian Hopkins:
tags:BigData
via NoSQL databases
By Forresters Boris Evelson and Brian Hopkins:
tags:BigData
via NoSQL databases
Jorge Lopez:
tags:BigData
via NoSQL databases
A three part tutorial on using MongoDB, PostgreSQL/PostGIS, and Javascript libraries for building interactive maps by Hans Kuder:
tags:MongoDB
via NoSQL databases
A new maintenance release from CouchDB:
tags:CouchDB
via NoSQL databases
A 12 minutes screencast introducing the basics of Couchbase Mobile for Android applications:
tags:Couchbase
via NoSQL databases
I've been posting a lot about deployments in the cloud and especially about deploying MongoDB in the Amazon cloud:
tags:MongoDB
via NoSQL databases
Published by a group from Los Alamos National Lab (Hristo Djidjev, Gary Sandine, Curtis Storlie, Scott Vander Wiel):
via NoSQL databases
Fast scan-ability . For very large JSON documents, scanning can be slow. To skip a nested document or array we have to scan through the intervening field completely. In addition as we go we must count nestings of braces, brackets, and quotation marks. In BSON, the size of these elements is at the beginning of the field's value, which makes skipping an element easy. Easy manipulation . We want to be able to modify information within a document efficiently. Additional data types . JSON is potentially a great interchange format, but it would be nice to have a a few more data types. Most importantly is the addition of a "byte array" data type. This avoids any need to do base64 encoding, which is awkward.
tags:MongoDB
via NoSQL databases
A while ago I've read in a PR announcement about Infolinks', an in-text advertising company, usage of Hadoop and HBase. Lior Schachter [1] , Infolinks Software Architect, has been kind to answer my questions.
via NoSQL databases
Puma.com and other related web properties are using Redis' hashes, lists, and sets (sorted and unsorted) for fragment caching and third party responses caching:
via NoSQL databases
Three great posts on the Hortonworks' blog, part 1 , part 2 , and part 3 , detailing the most important new features included with the Apache Pig 0.9 release:
via NoSQL databases
Interesting question raised by Hugo on the Redis group about modeling closure tables [1] in Redis. Didier Spezia offers a solution based on sorted sets.
tags:Redis
via NoSQL databases
That's the gist of this application. It is non-trivial and had a very rich design and interaction. My team had an excellent QA, excellent front end dev, and me who was the only one who knew MarkLogic. The other team chose to implement theirs using a Javascript front-end architecture communicating with CouchDB (later Java with MongoDB) on the backend. The two teams involved very skilled people. If these two technology approaches were going to go head-to-head, these were the people to do it.
via NoSQL databases
Amazon announced today a new service Amazon ElastiCache or Memcached in the cloud. The new service is still in beta and available only in the US East (Virginia) Region.
tags:Memcached
via NoSQL databases
Spring Data continues its march to "be all do all":
tags:Neo4j
via NoSQL databases
There's no new data in this infographic about BigData, but the amount of stored data by sector is very interesting:
tags:BigData
via NoSQL databases
Nuno Job's NanoCouch Node.js driver for CouchDB:
tags:CouchDB
via NoSQL databases
MongoDB 1.8.3 has been pushed out minutes ago and it includes just a couple of small bug fixes and improvements :
tags:MongoDB
via NoSQL databases
Patrick Durusau mentioned on his blog a new record set by Franz's AllegroGraph: 1 trillion RDF triples . This comes only 2 months after the previous Franz's AllegroGraph record of 310 billion triples .
tags:AllegroGraph
via NoSQL databases
Dion Hinchcliffe's excellent article analysing the complexity and the opportunities of Big Data:
tags:BigData
via NoSQL databases
Don Rippert interviewed by Derrick Harris (GigaOm):
via NoSQL databases
Dave Kellogg [1] :
tags:BigData
via NoSQL databases
Using MongoDB replicate sets:
tags:MongoDB
via NoSQL databases
It's been over a year since Backblaze revealed the designs of our first generation (67 terabyte) storage pod. During that time, we've remained focused on our mission to provide an unlimited online backup service for $5 per month. To maintain profitability, we continue to avoid overpriced commercial solutions, and we now build the Backblaze Storage Pod 2.0: a 135-terabyte, 4U server for $7,384. It's double the storage and twice the performance—at lower cost than the original.
via NoSQL databases
Bob Zurek lists the top 5 priorities of BI people:
tags:BigData
via NoSQL databases
One such method is a tool called Scoop that automatically imports data between these formats, given schema of both source and destination. We did some in depth research on this tool and concluded it could work well if you have relational data. In addition to this tool Map/Reduce allows various other options including a bulk import method. Bulk Importing bypasses the HBase API and writes contents, which are properly formatted as HBase data files - HFiles, directly to the file system. […] There are two steps involved in Bulk Import Preparing Data (HFiles) using Map/Reduce Importing Prepared Data into HBase table
via NoSQL databases
This article by Bob Warfield was the first NoSQL post to be Techmemed. Main point:
via NoSQL databases
Yesterday I tweeted three simple rules to learning NoSQL. […] The rules are: 1: Use MongoDB. 2: Take 20 minute to learn Redis 3: Watch this video to understand Dynamo.
via NoSQL databases
Denial. Resistance. Acceptance. Embrace. Survival.
tags:BigData
via NoSQL databases
Cloudera has created a set of tools named Hoop allowing access through HTTP/S to HDFS. My first question was why would you use HTTP to access HDFS? Here is the answer:
tags:Hadoop
via NoSQL databases
Cassandra is a peer-to-peer architecture which is typically deployed on a large number of servers. Deploying, managing, and upgrading these systems by is more administrative time especially as your cluster grows. Puppet provides a simple way to install Cassandra.
tags:Cassandra
via NoSQL databases
After linking to the MongoDB in the Amazon cloud , MongoDB and EC2 and the older MongoDB on Amazon EC2 with EBS volumes , Arnout Kazemier commented:
tags:MongoDB
via NoSQL databases
If you read the story of the MongoDB Erlang driver , you'll probably enjoy reading about Riak's revamped Java client or the improvements in the Riak's Python client .
tags:Riak
via NoSQL databases
As a follow up to the MongoDB positioning itself as Big Data and development agile environment , I've found this bit of data on Curt Monash's blog [1] :
via NoSQL databases
Mike Miller [1] naming 3 issues that Hadoop and its ecosystem are facing: investment , data complexity , and
via NoSQL databases
[…] we've built up a Doctor Who data model that shows how Neo4j can be used to address several different data and domain concerns. For example, part of the dataset includes timeline data, comprising seasons, stories and episodes; elsewhere we've social network-like data, with characters connected to one another through being companions, allies or enemies of the Doctor. It's a messy and densely-connected dataset – much like the data you might find in a real-world enterprise. Some of it is of high quality, some of it is lacking in detail. And for every seeming invariant in the domain, there's an awkward exception – again, much like the real world.
tags:Neo4j
via NoSQL databases
An interesting addition for the upcoming sones GraphDB 2.1:
tags:sones
via NoSQL databases
Couchbase has raised $14mil in a new round of funding bringing the company's total of capital raised to $30mil. This capital will be used to:
tags:Couchbase
via NoSQL databases
Even when we do start to be able to integrate and correlate event, configuration, vulnerability or logging data, it's very IT-centric. It's very INFRASTRUCTURE-centric. It doesn't really include much value about the actual information in use/transit or the implication of how it's being consumed or related to. The major issue with data "lakes" is that for data to evolve into intelligence and knowledge requires a good understanding of the data itself – how else would one reconcile artifact 'A' with variable 'B' and context 'C' generated from 3 separate data sources .
via NoSQL databases
IT decision-makers need to become familiar with the strengths and weaknesses of non-relational systems so they can make informed decisions as to their possible place in the IT infrastructure. The "one size fits all" RDBMS has made database technology decisions relatively easy; in a hybrid future, picking the right database tool may become more complex.
via NoSQL databases
From scripting to MapReduce:
via NoSQL databases
I'm not sure how long ago I forgot about this, but I just remembered about a nice little feature when I was moving some data around on a Riak cluster. As the post title points out, this feature is streaming list keys in a bucket. […] The useful bit of this is because list_keys can be a lengthy operation, you can begin doing work on the data before you receive all of the keys.
tags:Riak
via NoSQL databases
Log analysis has become a difficult task in our production environment at work because logs are distributed on different machines and in different files. So, we wanted all the exception logs from all of our apps to be tracked centrally and viewed in single console. And once again, we found another good use case for Redis. Our strategy is to dump all of critical logs in to a Redis List and have a background worker which continuously pulls logs from the Redis List and write stuff in log file.
tags:Redis
via NoSQL databases
I was just preparing for a long trip when Michael Stonebraker created a new storm . I only caught Domas Mituzas' sharp reply and Werner Vogel's comment :
via NoSQL databases
The story of migrating 30PB of HDFS stored data:
tags:Hadoop
via NoSQL databases
A complete example of how prioritization of queues in BLPOP works:
tags:Redis
via NoSQL databases
Knowing the strenghts and weaknesses of each of them could help making a decision. But do not fall for comparing them:
via NoSQL databases
blitz.io, the winner of the Best CouchDB app :
tags:CouchDB
via NoSQL databases
Cassandra 0.8 included the first version of Cassandra Query Language or CQL . Eric Evans gave a talk at Cassandra SF 2011 introducing Cassandra Query Language as an alternative and not replacement of the current Cassandra API:
tags:Cassandra
via NoSQL databases
Max Schireson positions MongoDB as a solution for Big Data and development agility:
via NoSQL databases
A short video talking about an interesting storage technology from Nimble Storage :
via NoSQL databases
Ask and you'll be answered . Nathan Marz announces that Twitter will open source Storm, the Hadoop-like real-time data processing tool developed at BackType:
tags:Hadoop
via NoSQL databases
Starting from the architecture of Facebook's realtime analytics presented in the paper Apache Hadoop Goes Realtime at Facebook and Dhruba Borthakur's excellent posts HDFS: Realtime Hadoop and HBase Usage at Facebook , Nati Shalom describes an alternative approach for real-time analytics using data grids making the following assumptions:
via NoSQL databases
Cocoafish's [1] story of going NoSQL started based on this analysis:
tags:MongoDB
via NoSQL databases
Even if Heroku's post is about polyglot programming and commodifying deployments, many of the points apply to polylgot persistence. Especially this one:
via NoSQL databases
I've been pretty excited about Google's LevelDB, not to mention there are some really old tanks already in the battle field like BDB, Tokyo Cabinet (Kyoto Cabinet as new one), HamsterDB etc. Fortunately I've already worked with Kyoto Cabinet and when I looked at the benchmarks I was totally blown away.
via NoSQL databases
Boris Lublinsky and Michael Segel series of articles about Oozie, the Hadoop workflow framework, published on InfoQ:
via NoSQL databases
Google open sourced a while ago LevelDB , a C++ library that provides an ordered mapping key-value storage. LevelDB performance convinced Basho guys to experiment with adding LevelDB as a storage engine for Riak . And there's also a benchmark comparing LevelDB with SQLite and Kyoto Cabinet.
tags:Riak
via NoSQL databases
eBay is a prime example of the benefits of flash. Nimbus Data CEO Thomas Isakovich told me that eBay had only 2.5TB of flash installed six months ago before recently upgrading to 100TB. Within the PayPal division, where Nimbus is deployed, Isakovich said eBay has cut power costs by 78 percent, cut its rack space by half and is able to better meet performance demand overall by spinning up virtual machines even faster.
via NoSQL databases
An infographic with the largest data storage centers from Mozy via ReadWriteWeb :
tags:BigData
via NoSQL databases
The architecture of blitz.io , a geo-distributed load and performance testing service built on top of the CouchDB:
tags:CouchDB
via NoSQL databases
The article doesn't get into the technical details [1] , but this sounds like a BigData scenario with offline batch processing, where Hadoop is "the solution":
via NoSQL databases
About me: Software architect, Web Aficionado, Cloud Computing Fanboy, Geek Entrepreneur, Speaker, Co-founder and CTO of InfoQ.com, Writing also about NoSQL on the myNoSQL blog
