Cassandra User Frustrations and Improvement Suggestions
From a very civilized thread on such a delicate topic:
tags:Cassandra
via NoSQL databases
From a very civilized thread on such a delicate topic:
tags:Cassandra
via NoSQL databases
A discussion on the MongoDB group about EBS snapshot backups of journaled MongoDB reminded me of a Jared Rosoff's slides "MongoDB on EC2 and EBS" which cover many important aspects of running MongoDB on the Amazon cloud:
tags:MongoDB
via NoSQL databases
When prototyping an application it's common to need to change the database frequently and MongoDB allows me to do that without the need of generating migrations and keeping the DB schema in the model layer.
tags:MongoDB
via NoSQL databases
From the "Node.js Convincing the boss guide":
via NoSQL databases
From the series “follow the money”, “cherchez la femme”, now via Jim Harris [1] follow the data :
via NoSQL databases
The following diagram visualizes the data sets in the LOD cloud as well as their interlinkage relationships. Each node in this cloud diagram represents a distinct data set published as Linked Data. The arcs indicate that RDF links exist between items in the two connected data sets. Heavier arcs roughly correspond to a greater number of links between two data sets, while bidirectional arcs indicate the outward links to the other exist in each data set.
via NoSQL databases
Liam Green-Hughes concluding after briefly presenting the basic features in CouchDB:
tags:CouchDB
via NoSQL databases
Today Netflix can be seen as a leader in what can be achieved by combining cloud computing and polyglot persistence. Not only that, but Netflix has chosen to share their experience with everyone else so we can all learn from their experience.
via NoSQL databases
The picture should speak for Digg's polyglot persistency approach:
via NoSQL databases
A conversation about administering and scaling up a Riak cluster on Amazon EC2 captured on Mark Phillip's Riak recap
tags:Riak
via NoSQL databases
Ravel, an Austin, Texas-based company, wants to provide a supported, open-source version of Google's Pregel software called GoldenOrb to handle large-scale graph analytics.
via NoSQL databases
The new architecture of Evident ClearStone APM is using both Cassandra and Neo4j:
tags:Neo4j
via NoSQL databases
As can be seen, whether the off-host process that manages the cache-data is MongoD or MemcacheD or Terracotta-Server, architecturally they all look equivalent - i.e. a pure-L2 with no-L1 - so that all data needs to be retrieved from over the network and then massaged into a POJO for consumption by the application.
via NoSQL databases
The usual suspects mixed together: node.js , socket.io , Redis , express , backbone .
tags:Redis
via NoSQL databases
Karl Seguin has put together a 32 page ebook answering some common questions related to MongoDB:
tags:MongoDB
via NoSQL databases
The best panel from Structure Big Data 2011 . Featuring Amr Awadallah [1] , Mike Hoskins [2] , Dwight Merriman [3] , Todd Papaioannou [4] , Ben Werther [5] , the DataStax Brisk official announcement , and a cool parallel between Hadoop processing and cooking approaches from Amr. A must see .
via NoSQL databases
The best panel from Structure Big Data 2011 . Featuring Amr Awadallah [1] , Mike Hoskins [2] , Dwight Merriman [3] , Todd Papaioannou [4] , Ben Werther [5] , the DataStax Brisk official announcement , and a cool parallel between Hadoop processing and cooking approaches from Amr. A must see .
via NoSQL databases
10gen continued its MongoDB popularization tour around the world with three events in Europe: London, Paris, and Berlin. SkillsMatter, the organizers of MongoUK have recorded all the sessions and made them available here
tags:MongoDB
via NoSQL databases
Video and slides of the latest webinar from Basho guys about using Riak from node.js:
tags:Riak
via NoSQL databases
Virtual hosts and URL rewrites have been introduced over an year ago in CouchDB 0.11 . And they have been documented in getting ready for CouchDB .
tags:CouchDB
via NoSQL databases
For those who read my blog and follow my research then you know I chose MongoDB as my backend database to store my PDFs […] Why not standard SQL? Well, I wanted the data to be returned without having to parse a blob everytime (JSON/BSON), PDF files contain a lot of data that are often unique to themselves (document based storing) and Mongo also made it easy to handle dynamic content (no columns). […] I wanted to highlight an interesting way of collecting data and answering questions about my malware using Map/Reduce.
via NoSQL databases
They are said to be building a proprietary replacement for the Hadoop Distributed File System that's allegedly three times faster than the current open-source version. It comes with snapshots and no NameNode single point of failure (SPOF), and is supposed to be API-compatible with HDFS, so it can be a drop-in replacement.
tags:Hadoop
via NoSQL databases
If you'd ask me this question, I'm sure my initial answer would be: "absolutely". And I guess I would not be alone. But is that the right answer?
via NoSQL databases
My ideal database would borrow from RDBMS (like SQL Server), Document databases (like MongoDB), Graph Databases and Semantic Web Triple Stores; it would be the perfect hybrid of all of these and it would configure itself to be as efficient as possible answering queries.
via NoSQL databases
I am starting a joint venture with my very good friend, where I am doing the coding, and I decided that I will be doing it in Node.js with a MongoDB backend. […] Node and MongoDB are built for scaling, but I'm not too concerned about that at the moment. […] Although it is 10% for the learning factor, 90% for the cool factor.
tags:MongoDB
via NoSQL databases
A story of using Riak for its Dynamo-like building blocks :
via NoSQL databases
Now that's a title: The Brangelina of Big Data: Cassandra mates with Hadoop. Open source celebrity supercouple . The article is a genealogy tree: Hadoop, Hive, Cassandra, DataStax.
via NoSQL databases
To show you how to achieve this we are going to build a quick Tic Tac Toe application using SocketIO, gevent, redis, and Django.
tags:Redis
via NoSQL databases
I just heard DataStax, the company offering Cassandra services, announcement about Brisk a Hadoop and Hive distribution built on top of Cassandra:
tags:Hadoop
via NoSQL databases
John Nunemaker shares a couple of tricks using MongoDB ObjectID:
tags:MongoDB
via NoSQL databases
Percona guys [1] have run, analyzed, and concluded about VoltDB scalability:
tags:VoltDB
via NoSQL databases
Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.
via NoSQL databases
Mike Minelli: Working with big data can be classified into three basic categories […] One is information management, a second is business intelligence, and the third is advanced analytics Information management captures and stores the information, BI analyzes data to see what has happened in the past, and advanced analytics is predictive, looking at what the data indicates for the future.
via NoSQL databases
Donald Feinberg, vice president and distinguished analyst at Gartner:
via NoSQL databases
Not only is Mozilla celebrating the release of Firefox 4 , but they took the time to set up a nice visualization for downloads .
tags:HBase
via NoSQL databases
Interesting way to compare data marketplaces:
via NoSQL databases
The architecture of the future based on polyglot persistence:
via NoSQL databases
Klint Finley (RWW) connecting the dots between Microsoft Research projects Trinity , Dryad , Probase , Bing and competition (Google, Facebook):
via NoSQL databases
SQL Azure provides relational database functionality as a utility service. Cloud-based database solutions such as SQL Azure can provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced management overhead.
via NoSQL databases
As of now 5 hours until we will get Globals , yet another NoSQL database from InterSystems, the Caché object database creators.
via NoSQL databases
The subject of scaling graph databases is popping up every now and then proving that sharding highly connected data is still an unresolved problem.
tags:Neo4j
via NoSQL databases
Trinity is a graph database and computation platform over distributed memory cloud. As a database, it provides features such as highly concurrent query processing, transaction, consistency control. As a computation platform, it provides synchronous and asynchronous batch-mode computations on large scale graphs. Trinity can be deployed on one machine or hundreds of machines.
via NoSQL databases
I'm still distilling what happened at Reddit the other days when failures of EBS in a single availability zone took Reddit down for many hours:
via NoSQL databases
James Governor reporting from the HP CEO Leo Apotheker keynote at the HP Analyst Summit:
via NoSQL databases
Maybe I'm over-simplifying it, but I'm reading Ketan Karia's [1] "11 BigData analytics predictions for 2011" as in:
via NoSQL databases
It turns out the reason it was the right choice was that Mongo is quite traditional. It relies on you having a middle tier where your app logic resides. Couch is more radical and it not only replaces the traditional database but also the middle tier. If, as we did on this particular project, you already have a middle tier then using Couch adds complexity to the solution. Some of the app functionality ends up in the mid-tier and some in Couch. With Mongo you keep doing what you were doing. Query result processing and Map Reduce functions are all defined within the Java, Python, etc. code. When we needed a data import script the logic was all in the script and not split between script and database.
via NoSQL databases
SourceForge was the first major MongoDB case study — you can check Mark Ramm's talk about MongoDB at SourceForge . Now they are releasing Allura which uses MongoDB:
tags:MongoDB
via NoSQL databases
The last couple of posts were about BigData and Jeffrey Horner's presentation is inline with this topic:
via NoSQL databases
Just in case you needed yet another example of building analytics systems on top of MongoDB: Patrick Stokes' [1] presentation about how Buddy Media is implementing their entire platform analytics engine on MongoDB:
tags:MongoDB
via NoSQL databases
The Informatica accord is Cloudera's second partnership this year with a leading DI player. Back in August, Cloudera cemented a deal with open source software (OSS) data integration (DI) specialist Talend. It also has partnerships with Teradata Corp., the former Netezza Inc., the former Greenplum Software Corp., Aster Data Systems Inc., Vertica Inc., and Pentaho. One thing's for sure: Cloudera is certainly attracting attention.
tags:Hadoop
via NoSQL databases
You've probably read everywhere about the Couchbase Server first release .
via NoSQL databases
Drizzle aims to be different from MySQL, stripping out "unnecessary" features loved by enterprise and OEMs in the name of greater speed and simplicity and for reduced management overhead. Drizzle has no stored procedures, triggers, or views […]
tags:VoltDB
via NoSQL databases
Tailable cursors are a cool feature of MongoDB. It allows you to setup scripts that run forever and are constantly processing new data that gets inserted to the collection. You need a capped collection in order to tail a cursor […]
tags:MongoDB
via NoSQL databases
A couple of tricks for using MongoDB's $set operator, dealing with properties that contain dots for creating a basic rating system:
tags:MongoDB
via NoSQL databases
Google paper presented at Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, 2010:
via NoSQL databases
In a guest post hosted by Cloudera blog , Bob Gourley [1] enumerates the characteristics of working with Big Data from federal agencies perspective.
via NoSQL databases
This book is about a new, fourth paradigm for science based on data- intensive computing. In such scientific research, we are at a stage of development that is analogous to when the printing press was invented. Printing took a thousand years to develop and evolve into the many forms it takes today. Using computers to gain understanding from data created and stored in our electronic data stores will likely take decades — or less. In Jim Gray's last talk to the Computer Science and Telecommunications Board on January 11, 2007, he described his vision of the fourth paradigm of scientific research. He outlined a two-part plea for the funding of tools for data capture, curation, and analysis, and for a communication and publication infrastructure. He argued for the establishment of modern stores for data and documents that are on par with traditional libraries.
via NoSQL databases
The project is called mongo-emf […] One feature that I hope you will find attractive is that there are no annotations or XML configuration files required.
tags:MongoDB
via NoSQL databases
Simple as it may be at its core, Berkeley DB can be configured to provide concurrent non-blocking access or support transactions, scaled out as a highly available cluster of master-slave replicas, or in a number of other ways. Berkeley DB is a pure storage engine that makes no assumptions about an implied schema or structure to the key-value pairs. Therefore, Berkeley DB easily allows for higher level API, query, and modeling abstractions on top of the underlying key-value store.
via NoSQL databases
Data gathered and sold by RapLeaf can be very specific. According to documents reviewed by the Journal, RapLeaf's segments recently included a person's household income range, age range, political leaning, and gender and age of children in the household, as well as interests in topics including religion, the Bible, gambling, tobacco, adult entertainment and "get rich quick" offers. In all, RapLeaf segmented people into more than 400 categories, the documents indicated.
via NoSQL databases
It started as an embedded database. Then it became a server. Now it is available on Microsoft Azure:
tags:Neo4j
via NoSQL databases
Inspired by Auto Complete with Redis , Soulmate uses sorted sets to build an index of partially completed words and the corresponding top matching items, and provides a simple sinatra app to query them.
tags:Redis
via NoSQL databases
That said it can be more than a little painful to watch it go through the normal growing pains associated with a promising project. […] When we learned that Redis VM wasn't performing at scale in production environments it really spooked us. When antirez said he was looking down the barrel of implementing his own BTree (not even the best solution for modern storage backends) from scratch I started to get upset. Like angry upset. When the news started to float about the filesystem datastore (one file per key) I started to look for other solutions.
tags:Redis
via NoSQL databases
Foursquare’s move from querying the production databases to a data analytics system using Hadoop and Hive with Redis playing the role of a cache:
via NoSQL databases
Using as a pretext a comparison with MongoDB — why MongoDB? — Sergei Tsarev provides some details about Clustrix data distribution, fault tolerance, and availability models.
via NoSQL databases
Madhu Reddy [1] comparing the commercial and not yet released Dryad with the open source, widely used Hadoop :
tags:Hadoop
via NoSQL databases
I didn't know too much about RethinkDB until watching Tim Anglade's interview with Slava Akhmechet and Mike Glukhovsky . There were mainly three things that caught my attention:
via NoSQL databases
Three presentations covering the various NoSQL usages at Twitter:
via NoSQL databases
From Alon Halevy, Peter Norvig, and Fernando Pereira paper (PDF) :
via NoSQL databases
We were already aware of Riak before we started using CouchDB, but we weren't sure about trusting a new product at this point, so we decided, after some benchmark, to go for CouchDB. After the first couple of months, it was obvious that this was a bad choice. Our main problems with CouchDB is scalability, versioning and stability.
via NoSQL databases
There are plenty of CouchDB tutorials and the CouchDB definitive guide is available online, but having CouchDB basics — setup, admin interface, user management, common access patterns — presented in a single page can still be useful. Gavin Cooper has published such a short guide to CouchDB:
tags:CouchDB
via NoSQL databases
I wasn't aware that RDF stores are the silverbullet for storage:
via NoSQL databases
The notion of a replication-friendly storage system inspired me to build a filesystem around it, using the FUSE system.
tags:Redis
via NoSQL databases
About Mozilla Grouperfish architecture and choosing a scalable storage solution:
via NoSQL databases
The challenge for NoSQL members going forward — if they care to — will be to keep the community spirit strong as commercial interests grow even stronger. Given the relative immaturity of the market, there's certainly something to be said about maintaining the communal vibe and openly discussing their various projects in mass meetups in Silicon Valley and across the country. A rising tide lifts all boats, after all.
via NoSQL databases
Schemaless, JSON, Speed:
tags:MongoDB
via NoSQL databases
Jeremy Zawodny (Craigslist) explains how to optimize the import of data that far exceeds the amount of RAM available in a sharded MongoDB cluster:
tags:MongoDB
via NoSQL databases
A Quora.com thread asking about profitable big data opportunities. While nobody can predict the future, here are some suggestions from Rahul Sood :
via NoSQL databases
If you are using MongoDB on a dedicated server then you generally want it to use all the memory it can but if you want to use it on a server shared with other processes then you will want to put a cap on how much it uses to ensure memory is kept available for the other processes. So is it possible if you are not on a virtualized environment? Yes and we'll explore how
tags:MongoDB
via NoSQL databases
More applications of HBase at Facebook, after the new messaging system :
tags:HBase
via NoSQL databases
Just in case you still don't believe that everyone — and I mean everyone — is building a "chat" application using node.js and some flavor of NoSQL database, here is another one: chattrr . It uses node.js and Redis and the code is available on GitHub .
tags:Redis
via NoSQL databases
The 5 R: Ruby on Rails, Redis, Resque, Rufus.
tags:Redis
via NoSQL databases
You think you are ready to scale:
tags:MongoDB
via NoSQL databases
[SQL Azure] Federations bring great benefits of NoSQL model into SQL Azure where it is needed most. I have a special love for RDMSs after having worked on 2, Informix and SQL Server but I also have a great appreciation for NoSQL qualities after having worked on challenging web platforms. These web platforms need flexible app models with elasticity to handle unpredictable capacity requirements and needed the ability to deliver great computational capacity to handle peaks and at the same time deliver that with great economics. NoSQL does bring advantages in this space and I'd argue SQL Azure is inheriting some of these properties of NoSQL through federations.
via NoSQL databases
In its way towards the 1.0 version, OrientDB announced a new release featuring:
tags:OrientDB
via NoSQL databases
A long series of articles by Todd Anderson on building a DHTML application for mobiles using jQuery Mobile and CouchDB:
tags:CouchDB
via NoSQL databases
Using an external reliable storage for web sessions makes sense when you don't want to use sticky balancers:
tags:Riak
via NoSQL databases
I love how this sounds in French:
tags:Redis,memcached,memcacheDB,Membase
via NoSQL databases
This snippet uses a Riak bucket as a Django session store.
tags:Riak
via NoSQL databases
Data warehousing giant Teradata today agreed to acquire Aster Data, a data analytics provider, proving it's no longer enough to be able to store and access a lot of data quickly; one must also be able to analyze it quickly. This move toward the real-time analysis of data has helped propel other big data acquisitions, from EMC's purchase of Greenplum to IBM's $1.7 billion buy of data warehousing giant Netezza.
via NoSQL databases
I love how this sounds in French:
tags:Redis,MemcacheDB,Membase
via NoSQL databases
So you know why CouchDB on mobile is an awesome fit, but getting CouchDB working on the device is only the first step, for it to be useful CouchDB needs to provide an easy and convenient way to be used on mobile devices. This is the first in a series of introductions to CouchDB on Android. In this post I will take a very basic CouchApp and turn it into a native Android App.
tags:CouchDB
via NoSQL databases
The 8 6 reasons [1] Adku prefers Cassandra to HBase:
tags:HBase
via NoSQL databases
Written in C, supporting HTTP 1.1 pipelining, using only GET and POST and making Redis commands part of the URI, with output in multiple formats (json, bson, txt, raw), ready to be forked or used on GitHub
tags:Redis
via NoSQL databases
You basically have two options in how to store RDF data in wide-column databases like HBase and Cassandra: the resource-centric approach and the statement-centric approach. In the statement-oriented approach, each RDF statement corresponds to a row key (for instance, a UUID) and contains subject, predicate and object columns. In Cassandra, each of these would be supercolumns that would then contain subcolumns such as type and value, to differentiate between RDF literals, blank nodes and URIs. If you needed to support named graphs, each row could also have a context column that would contain a list of the named graphs that the statement was part of. […] In view of the previous considerations, the resource-oriented approach is generally a better natural fit for storing RDF data in wide-column databases. In this approach, each RDF subject/resource corresponds to a row key, and each RDF predicate/property corresponds to a column or supercolumn. Keyspaces can be used to represent RDF repositories, and column families can be used to represent named graphs.
tags:HBase
via NoSQL databases
This is a guest post by Greg Luck, Founder and CTO, Ehcache .
via NoSQL databases
We're against complexity. We believe designing systems is a fight against complexity. We'll accept to fight the complexity when it's worthwhile but we'll try hard to recognize when a small feature is not worth 1000s of lines of code. Most of the time the best way to fight complexity is by not creating it at all.
tags:Redis
via NoSQL databases
Geek fun: take node.js and a NoSQL database — usually it is MongoDB, CouchDB, or Redis, but adventurous types could even try Riak, HBase, or Cassandra — and create a "real-time" chat or collaborative editor:
tags:Redis
via NoSQL databases
You might wonder why posting about this glitch in Gmail. It is this part that makes it relevant for data folks:
via NoSQL databases
I know Futon is nice, but as a developer i need more power. And even though HTTP is ubiquitous, i need simple ways of sending requests to Couchdb, without dealing with setting the headers and json encoding the parameters. So I wrote a simple, irb based Couchdb Console, we call it Couchup.
tags:CouchDB
via NoSQL databases
The Jetty module jetty-session-redis uses Jedis, the Java client of Redis. The configuration is completely transparent for a Webapp since you only need to modify the jetty.xml server configuration plus de webapp context files. We have also implemented several serializers for your session attributes: XStream JSON JBoss Serializer JDK Serializer
tags:Redis
via NoSQL databases
The code is purposely a naive implementation, to test how fast each back end is without resorting to optimizations, hacks or tricks. There are probably ways of making it much faster. And even though the production code will be very different to this early experiment, it is not an evil, synthetic micro-benchmark: on the contrary, it is a real application!
tags:MongoDB
via NoSQL databases
About me: Software architect, Web Aficionado, Cloud Computing Fanboy, Geek Entrepreneur, Speaker, Co-founder and CTO of InfoQ.com, Writing also about NoSQL on the myNoSQL blog
