MongoDB at Craigslist
Craigslist moving 10TB of data in 5 billion documents over MongoDB:
tags:MongoDB
via NoSQL databases
Craigslist moving 10TB of data in 5 billion documents over MongoDB:
tags:MongoDB
via NoSQL databases
The workshop takes you through creating a Sinatra application using sample code from here . Once the Sinatra application which leverages Twitter is working, the workshop then takes you through adding Redis to your application. Finally the workshop ends after taking you through scaling your application instances up and then back down.
tags:Redis
via NoSQL databases
OJ Reeves discussing Riak on Talking Shop Down Under :
tags:Riak
via NoSQL databases
A bunch of Rails addons to enhance Mongoid mapped models timestamps, versioning, history, tagging, search, geo, tree, etc. — note how many of these libraries are calling themselves mapping tools or even ORMs and ask yourself if indeed there's no impedance mismatch :
tags:MongoDB
via NoSQL databases
David Floyer (ex IDC analyst) covers in a long article the major forces and trends in the storage industry and the major trends that will define IT development for the coming decade:
tags:BigData
via NoSQL databases
Foursquare describes the 3 different replica set setups for their MongoDB servers:
tags:MongoDB
via NoSQL databases
Little trick to get access to your logs:
tags:Redis
via NoSQL databases
A while back Eric Florenzano has published a post describing how Convore is using Redis Pub/Sub . There were a couple of things in that post that made me reach out to Eric for more details. Eric obliged:
tags:Redis
via NoSQL databases
EMC plans to bring MapR's proprietary replacement for the Hadoop Distributed File System to its enterprise-ready Apache Hadoop Greenplum HD:
tags:Hadoop
via NoSQL databases
TheRegister quoting Dwight Merriman, 10gen founder, in a post titled "MongoDB daddy: My baby beats Google BigTable":
tags:MongoDB
via NoSQL databases
Business intelligence applications are moving from the traditional connection to an OLAP Data source based on relational database systems to the ability to link to and consume data from a variety of disparate sources including social networks. The ability for a modern BI application to be able to use mashups of data to provide agility when dealing with integrations of multiple types of data sources has led to NoSql being promoted by many as the next big thing within BI. Does this mean that we have seen the end of the SQL style RDBMS system within the BI area – there are many pros and cons for both systems but I believe that there are still a place for both within the BI arena.
via NoSQL databases
It hasn't happened yet but it's a question of when, not if active monitoring of websites for availability and performance will be obsolete. My prediction is that it will happen in the next 5 years though if everything lines up it could be as soon as 2 years away. By active monitoring I am referring to testing a website on a regular interval and potentially from several locations to see if it is working and how long it takes to load (and bundled in with that the alarming, reporting, etc. that goes with it).
via NoSQL databases
Interesting idea on how to emulate in Redis MongoDB's db.runCommand( { getlasterror : 1 , w : 2 } ) [1] to verify propagation of writes:
via NoSQL databases
Iris Couch, the CouchDB hosting spin-off , showing how to setup up replication from a hosted CouchDB:
tags:CouchDB
via NoSQL databases
Researchers have set a new record for the rate of data transfer using a single laser: 26 terabits per second.
via NoSQL databases
ThriftDB presented today at TechCrunch Disrupt:
via NoSQL databases
RDBMSs use a table-based normalization approach to data, and that's a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both. They allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record. Performance falls off as RDBMSs normalize data. The reason: Normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that's when things slow down.
via NoSQL databases
There are some advanced CouchApp frameworks already. What reupholster does is allows you to experience writing a CouchApp as fast as possible, with very little learning curve. It just feels like you are editing a normal web project. The other advantage is you get to pick frameworks that you want for your application. Pick from Microjs, SproutCore, JavaScriptMVC, jquery, or just bare bones javascript. No lock-in.
tags:CouchDB
via NoSQL databases
Dynamic columns allows you to store a different set of columns for every row in the table. […] Dynamic columns works by storing the extra columns in a blob and having a small set of functions to manipulate it. The functions exist both in SQL and in the MariaDB client library to allow you to manipulate the data where it suits you best.
via NoSQL databases
Big Data comes first. Then the list contains: active data warehouse, data governance, master data management, data as an asset, data quality. The NoSQL term is not mentioned.
tags:BigData
via NoSQL databases
Enter the nginx modules Redis2 and Lua. The former allows you to make any call you like to Redis, as opposed to HttpRedis which only allows the plain old GET command. The latter allows you do embed Lua scripts in your nginx config, effectively giving nginx a bigger brain and allowing you to do some pretty fancy stuff. In this post we'll set nginx and redis up to serve as a reverse-proxy LRU cache.
tags:Redis
via NoSQL databases
We obtained several benefits by moving to CouchDB: Performance – Loading call analysis data in the CouchDB database is way faster than putting the same data in a MySQL database. Our preliminary results show a speed up factor of about 100 (this does not take the loading of audio recordings into account, though). Ok, we are comparing apples and oranges. CouchDB does not update the view indexes until they are requested, while MySQL updates its indexes as rows are inserted. And only a single document is inserted in CouchDB, compared to lots of rows in more than 15 tables in SQL. On the other hand, if insertions are done at application runtime (after the completion of the call), you better do it fast, especially if the IVR handles many hundred (if not thousand) ports. Evolution – Making modifications to a complex schema is painful, especially when you have applications deployed in the field. As documents do not have to follow a rigid schema, it is much easier to adapt our code to multiple versions. Attachments - Even if audio recordings can be stored in a traditional SQL database as blobs, a custom application is still required to access them. With CouchDB, recordings are stored as attachments to the JSON document for the corresponding call. Moreover, these recordings are easily accessible by other tools since CouchDB is itself a webserver and all documents and attachments have a URL.
tags:CouchDB
via NoSQL databases
MongoLab's offering capitalizes on two important trends we see impacting the vast majority of our portfolio companies: the rapid adoption of the cloud deployment model and and the increasing use of "Big Data" and NoSQL tools. […] Of the many offerings in the "Big Data" and NoSQL universe, we like the fact that MongoLab has chosen to specialize in 10gen's MongoDB. MongoDB's scalability (in size and read/write volume), its ability to run MapReduce jobs and its accelerating adoption among developers are all compelling aspects of the MongoDB platform.
tags:MongoDB
via NoSQL databases
Mark Pollack (VMWare) and Emil Eifrem (Neo Technology) answering the why and how to use Spring Data and Neo4j:
tags:Neo4j
via NoSQL databases
An academic from Drexel University decided to take a novel approach to answering this question. He opened up his HIV dataset to a competition,requiring participants to pick markers in a series of HIV genetic sequences that correlate with a change in viral load (a measure of the severity of infection). Participants would download the data, model it, make their predictions, those predictions were compared with actual outcomes and feedback was presented on a live leaderboard. Amazingly, within a week and a half, the best submission had already outdone the best methods in scientific literature. By the end of the competition, the state of the art had been outperformed by 10 per cent.
tags:BigData
via NoSQL databases
An interesting finding from Kresten Krab Thorup [1] on how key distribution is impacting performance:
tags:Riak
via NoSQL databases
Curt Monash published a brief overview of the OODBMS world that includes the majority of the object databases . According to him, Intersystems Caché is the most successful object-oriented database:
via NoSQL databases
The company also cemented its commitment to the Hadoop open source data analytics tool, identifying it as "the cornerstone of [IBM's] big data strategy" in a statement. IBM is the latest in a line of enterprises to stress their commitment to Hadoop. Enterprise storage vendor EMC put a tweaked Hadoop distribution at the heart of a recently updated range of data analytics Greenplum appliances, while business intelligence company Jaspersoft announced plans to better integrate its products with Hadoop in February.
tags:Hadoop
via NoSQL databases
From Scotland Ruby 2011, Martyn Loughran:
tags:Redis
via NoSQL databases
I don't know how many are going to deploy Hadoop on Microsoft Azure, but at least we know it is possible:
tags:Hadoop
via NoSQL databases
Mark Nottingham has a great post about benchmarking HTTP servers. All the 11 rules exposed in the post apply as they are to NoSQL benchmarks and generally to storage benchmarks:
via NoSQL databases
InterSystems, producers of the Caché database, have launched Globals ], a fast, proven, simple, flexible and free databases, 2 months ago . But after the initial announcement , I couldn't find and didn't hear much about it. This until Rob Tweed [1] and K.S.Bhaskar [2] took the time to explained some of the differences between InterSystems Globals and GT.M, both systems being implemented on top of the MUMPS Global Persistent Variables .
via NoSQL databases
Tim Berglund's apt comments about NoSQL technology:
via NoSQL databases
The decision was made and we decided to go with a 2 server solution, each server has 16G of memory and 100G of EBS volume attached to it. Both will have membase latest stable version installed and perform as a cluster in case one falls or anything happens, a fail safe if you will. In this post, I will walk you though what was done to perform this and how exactly it was done on the amazon cloud.
tags:Membase
via NoSQL databases
Here is an overview of the replicator pipeline to move data from MySQL to MongoDB. Pipelines are message processing flows within the replicator.
tags:MongoDB
via NoSQL databases
Here are the notes I've made while watching a webinar about building applications with VoltDB.
tags:VoltDB
via NoSQL databases
Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). Stepping back from our two failures, let's examine why these systems failed to scale for our needs: 1. Relational Database Architectures - Full table scans were slow, regardless of the storage engine used - Maintaining proper dimension tables, indexes and aggregate tables was painful - Parallelization of queries was not always supported or non-trivial 2. Massive NOSQL With Pre-Computation - Supporting high dimensional OLAP requires pre-computing an exponentially large amount of data
via NoSQL databases
Active Archive is a combined solution of open systems applications, disk and tape hardware that allows users to access all of their data, and gives you an effortless means to store and manage all of your data.
tags:BigData
via NoSQL databases
There are a couple of contradictory points in Sean Coates' story of migrating Gimme Bar from using CouchDB to MongoDB :
via NoSQL databases
Justin Sheehy 's slides on how using Erlang/OTP made possible building a robust, flexible, and simple Riak:
tags:Riak
via NoSQL databases
Why Graylog2 uses MongoDB:
tags:MongoDB
via NoSQL databases
From an interview with Uri Cohen and Yaron Parasol of GigaSpaces :
via NoSQL databases
On GitHub :
tags:Redis
via NoSQL databases
Two interesting quotes about what is BigData:
tags:BigData
via NoSQL databases
Ryan Rosario summarizing a panel from Data Scientist Summit, featuring Pete Skomoroch (LinkedIn), Sharon Franks Chiarella (Amazon Mechnical Turk), Gil Elbaz (Factual) and Toby Segaram (Google):
tags:BigData
via NoSQL databases
An upcoming EU report "will say that geo-location data has to be considered as personal data. … The rules on personal data apply to them," an EU official tells the Wall Street Journal. The implication is that data collected by cellphones, twitter, Facebook and others must be handled like names, birth dates, and other personal information: obtaining user consent, deleting after a certain period, and kept anonymously. This is absolutely preposterous.
tags:BigData
via NoSQL databases
35 minutes of Riak Search with Dan Reverri which will walk you from the Riak Search basics to running a sample application:
via NoSQL databases
When you choose an eventually consistent data store you're prioritizing availability and partition tolerance over consistency, but this doesn't mean your application has to be inconsistent. What it does mean is that you have to move your conflict resolution from writes to reads. Riak does almost all of the hard work for you, but if it's not acceptable to discard some writes then you will have to set allow_mult to true on your bucket(s) and handle siblings from your application. In some cases, this might be trivial. For example, if you have a set and only support adding to that set, then a merge operation is just the union of those two sets.
tags:Riak
via NoSQL databases
One of the most often mentioned issues reported by software engineers working with relational databases from object-oriented languages is the object-relational impedance mismatch . Document databases adopters are saying that one benefit of document stores is that there is no impedance mismatch between the object and document worlds.
via NoSQL databases
Long, impressive list of new features (notably authorization and authentication support) and improvements in Apache Hive 0.7.0 released end of March.
via NoSQL databases
This is pure speculation on my side based on the program of the first Couchbase developer conference .
tags:Couchbase
via NoSQL databases
Q: Can you tell us why you chose MongoDB? A: But briefly, MongoDB is a good fit for a lot of reasons. The schema-less document structure provides a lot of flexibility, while the search capabilities provide a lot more value than you get with key-value stores. At the same time, from an operational perspective, it feels very much like MySQL, so our systems and DBA groups are quite comfortable managing the deployment.
tags:MongoDB
via NoSQL databases
Philip Russom:
tags:BigData
via NoSQL databases
Jim Webber (Neo4j):
tags:Neo4j
via NoSQL databases
Bill Cook (President and GM, Data Computing Division, EMC):
tags:Hadoop
via NoSQL databases
GigaOm and RWW have coverage of the 5 Hadoop-related announcements:
tags:Hadoop
via NoSQL databases
Simhashing in MapReduce is a quick way to find clusters in a huge amount of data. By using Cascading and Cascalog we're able to work with MapReduce jobs at the level of functions rather than individual map-reduce phases.
via NoSQL databases
Some limitations and bugs in MongoDB, mostly related to its MapReduce and import/export:
via NoSQL databases
Lily, the only CMS built on top of HBase and using Solr as its search engine, has reached the 1.0 version.
tags:HBase
via NoSQL databases
Flavio Percoco Premoli offers concise answers to when and why to use MongoDB:
tags:MongoDB
via NoSQL databases
OpenCredo created it, Jussi Heinonen shares the details:
tags:Neo4j
via NoSQL databases
In preparation for the EMC Hadoop related announcement:
tags:Hadoop
via NoSQL databases
DataStax kept its promise and released Brisk : the Hadoop and Hive distribution using Cassandra, also known as Brangelina .
via NoSQL databases
So the other day, I wanted to quickly check something in BigCouch and thanks to Vagrant, chef(-solo) and a couple cookbooks — courtesy of Cloudant — this was exceptionally easy.
via NoSQL databases
Now, whenever we need to pass a message to the XMPP server from the webapp, we stick it into a special Redis list. The proxy-component is now connected to the XMPP server and also connected to Redis. Using Redis' BLPOP feature, the proxy-component 'listens' to the list, and forwards any new messages to the XMPP server. BLPOP is especially suited for this setup since it blocks the redis connection till a new item shows up in the list, making it quite zippy (otherwise you'd have to poll the list every few seconds, not as great or fast).
tags:Redis
via NoSQL databases
July 2010 (approximately 1 year ago) :
tags:MongoDB
via NoSQL databases
In this tutorial, you will learn how to create your own CouchApp using HTML, CSS, and JavaScript. Your application will perform database operations using Ajax powered by the jQuery framework. The application you will build is a contact manager that allows you to view, create, edit, and delete your contacts. Finally, you will learn how to replicate this application between two Apache CouchDB instances.
tags:CouchDB
via NoSQL databases
Marketers can break down and manage this information: Distinguish the unique identifier across all the data sources. Connect ad cookie data with web analytics cookie data to build the profile of each unique identifier. Connect that profile with data already logged in from other sources, including profiles with Facebook Twitter IDs. Continue to build on this basic profile, adding new data from sources like Foursquare as they become available.
tags:BigData
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
A GCS [Golomb Compressed Set] mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.
via NoSQL databases
The problems I found with the HOLES were that small spaces aren't reused at all and huge defragmentation was present. This caused a global slowness and the growth of the database on disk (in some cases many times the original size). After 2 weeks of work I've published in the SVN and maven the new version of the OrientDB storage with: In-line defrag: something like some File Systems already do by joining small holes all together. In-line defrag works while the database is online and in use Improved the management of small changes to records 2 configurable strategies of how to find the best hole to join during defrag process configurable hole distance to decide when to join multiple holes all together
tags:OrientDB
via NoSQL databases
A two-part article — part 1 and part 2 — introducing Redis using .NET:
tags:Redis
via NoSQL databases
Relative to the LAMP stack, NoSQL databases and other open source technologies, Microsoft technology is sometimes viewed as stodgy, non-innovative and expensive. For some, the Microsoft .NET Framework, SQL Server, SharePoint and certainly Windows and Office are impressive and reliable, but not the things that Web breakthroughs are made of.
via NoSQL databases
Markus Perdrizat about RainStor and HP announcement :
via NoSQL databases
Riak has a love affair with JavaScript, and we embrace it in various ways across the codebase. That's one of the many reasons Basho chose to sponsor both JSConf and NodeConf. And, since we are head over heels for JS, we thought it appropriate to assemble this page (complete with state-of-the-art animated GIF) aimed at getting developers up to speed with what Riak has to offer the JavaScript community.
tags:Riak
via NoSQL databases
Kresten Krab Thorup continues his experiments becoming more and more of an advocate of Riak :
tags:Riak
via NoSQL databases
An intro to Spring Data for French readers:
via NoSQL databases
The most common complaint against NoSQL is that if you know how to write good SQL queries then SQL works fine. If SQL is slow you can always tune it and make it faster.
via NoSQL databases
One of the reasons I'm positive about integrating scripting into Redis in the near future (but don't take this as a promise!) is that is almost our only salvation from making Redis bloated. […] But everybody has a different problem. How much commands should we add? With scripting all this specific problems are solved in a general way without making the Redis server a mess with a big number of commands, and without trying to implement our "little language" that will later turn in an ill conceived real language.
tags:Redis
via NoSQL databases
Mobile Couchbase for iOS is delivered as an embeddable library with seamless Apple Xcode IDE integration, ensuring a familiar development experience for developers building iPhone and iPad apps. PR announcement
tags:Couchbase
via NoSQL databases
About me: Software architect, Web Aficionado, Cloud Computing Fanboy, Geek Entrepreneur, Speaker, Co-founder and CTO of InfoQ.com, Writing also about NoSQL on the myNoSQL blog
