NoSQL

database

https://www.youtube.com/watch?v=qI_g07C_Q5I
https://www.youtube.com/watch?v=XPqrY7YEs0A
https://www.youtube.com/watch?v=gJFG04Sy6NY

Why NoSQL

https://www.sitepoint.com/premium/screencasts/real-world-use-cases-of-nosql-databases
http://www.sitepoint.com/sql-vs-nosql-choose/
http://www.sitepoint.com/sql-vs-nosql-differences/
https://drill.apache.org/
http://www.sitepoint.com/a-look-at-orientdb-the-graph-document-nosql/
MapReduce
Google Percolator
http://gora.apache.org/

http://couchdb.apache.org/
http://www.sitepoint.com/rethinkdb-ruby-map-reduce-joins/
http://www.rethinkdb.com/
http://radar.oreilly.com/2014/09/scaling-nosql-databases-5-tips-for-increasing-performance.html
http://www.moneylife.in/business-wire-news/aerospike-expands-access-to-next-generation-nosql-database-with-startup-special-and-trade-in-program/40724.html
http://aerospike.com/press-releases/aerospike-open-sources-visionary-database/
http://martinfowler.com/nosql.html
http://en.wikipedia.org/wiki/NoSQL
NoSQL
http://gigaom.com/cloud/cloud-databases-101-who-builds-em-and-what-they-do/
http://www.facebook.com/note.php?note_id=24413138919
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/
http://blog.adku.com/2011/02/hbase-vs-cassandra.html
http://whynosql.com/cassandra-vs-hbase/
http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform
http://www.readwriteweb.com/enterprise/2011/08/from-big-data-to-nosql-the-rea-2.php
http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
http://horicky.blogspot.com/2010/10/bigtable-model-with-cassandra-and-hbase.html
http://www.devx.com/dbzone/Article/45636
http://zef.me/1990/nosql-db-comparison
http://en.wikipedia.org/wiki/Ehcache
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis/
http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis/
http://nosql.mypopescu.com/post/1659476530/another-nosql-comparison-evaluation-guide
http://www.readwriteweb.com/cloud/2010/11/nosql-comparison.php
http://www.thoughtworks.com/articles/nosql-comparison
http://www.informationweek.com/news/software/enterprise_apps/231601449
http://www.infoq.com/news/2011/12/relational-nosql-databases/
http://gigaom.com/cloud/database-superstar-jim-starkey-touts-nuodbs-new-patent/
http://hal2020.com/2012/02/09/a-perspective-on-big-data-nosql-and-relational-databases/
http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html
http://www.codeproject.com/Articles/375413/RaptorDB-the-Document-Store (Nice explanation of document store)
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems
http://en.wikipedia.org/wiki/Comparison_of_structured_storage_software
http://en.wikipedia.org/wiki/Faceted_search
http://en.wikipedia.org/wiki/List_of_object_database_management_systems
http://en.wikipedia.org/wiki/Triplestore
http://en.wikipedia.org/wiki/RDF_Database
http://en.wikipedia.org/wiki/Distributed_cache
http://en.wikipedia.org/wiki/MultiValue
http://en.wikipedia.org/wiki/Object_database
http://db.lcs.mit.edu/projects/cstore/vldb.pdf
http://www.cs.umb.edu/~poneil/TPC_Talk082409.pdf
http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_2.pdf
http://arxiv.org/abs/0901.3751
http://arxiv.org/abs/0909.1346
http://www.vldb.org/pvldb/1/1454174.pdf
http://www.youtube.com/watch?v=UTKcAdrjt1Y
http://t.co/6Xb5PRvH
http://community.tableausoftware.com/thread/110876
http://infinidb.org/component/content/article/53/218
http://publications.lib.chalmers.se/records/fulltext/123839.pdf
http://unqlspec.org/display/UnQL/Home
http://www.infoq.com/news/2011/08/UnQL
http://db.cs.berkeley.edu/claremont/claremontreport08.pdf
http://about.digg.com/blog/looking-future-cassandra
http://ensemble.jrc.ec.europa.eu/
http://www.facebook.com/note.php?note_id=24413138919&id=9445547199&index=9
http://cs.ucsb.edu/~ckrintz/papers/ieeecloud11.pdf
http://www.globule.org/publi/CSTWAC_ircs53.html
http://www.usenix.org/events/osdi10/tech/full_papers/Peng.pdf
http://www.cs.uwaterloo.ca/~c15zhang/ZhangDeSterckGrid2010.pdf
http://chenzhang.info/HBaseSI.pdf
http://static.last.fm/johan/nosql-20090611/cassandra_nosql.pdf
http://www.globule.org/publi/CJQCDS_ircs68.html
https://www.facebook.com/notes/facebook-engineering/mysql-and-database-engineering-mark-callaghan/10150599729938920
http://engineering.twitter.com/2012/04/mysql-at-twitter.html
http://dba.stackexchange.com/a/619
http://dba.stackexchange.com/questions/607/what-is-a-key-value-store-database
http://blog.marc-seeger.de/assets/papers/Ultra_Large_Sites_SS09-Seeger_Key_Value_Stores.pdf
http://blog.marc-seeger.de/2009/09/21/key-value-stores-a-practical-overview/
https://wiki.basho.com/
http://www.mgateway.com/docs/universalNoSQL.pdf
http://www.christof-strauch.de/nosqldbs.pdf
http://www.infoq.com/articles/graph-nosql-neo4j
http://www.alachisoft.com/resources/articles/managing-data-relationships.html
http://blog.datagraph.org/2010/04/rdf-nosql-diff
http://www.bigdbahead.com/?cat=32
NoSQL Tutorial
Presentation: An Introduction to FluidDB
http://www.aosabook.org/en/nosql.html
http://nosql.mypopescu.com/
http://blog.zawodny.com/2011/07/23/nosql-is-what/
http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
http://www.nicholasgoodman.com/bt/blog/2011/08/29/nosql-now-2011-review-of-adhoc-analytic-architectures/
http://www.nicholasgoodman.com/bt/blog/2011/08/23/splunk-is-nosql-eee-and-queryable-via-sql/
http://www.nicholasgoodman.com/bt/blog/2011/06/24/pushdown-query-access-to-hivehadoop-data/
http://www.infoq.com/articles/virtual-panel-nosql-database-patterns
http://blog.heroku.com/archives/2010/7/20/nosql/
http://en.wikipedia.org/wiki/UnQL
http://en.wikipedia.org/wiki/XQuery
http://en.wikipedia.org/wiki/SPARQL
http://en.wikipedia.org/wiki/Apache_Cassandra
http://www.cmswire.com/cms/enterprise-cms/infinidb-20-supports-big-data-analytics-009089.php
http://www.enterpriseirregulars.com/28857/calpont%E2%80%99s-infinidb-%E2%80%93-another-adbms-insurgent-arises/
http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
http://www.infinidb.org/blogs/view/81-infinidb-to-1-trillion-rows-1039909436172-
http://www.calpont.com/about/blog/view/18-a-behind-the-scenes-look-at-infinidb-part-1-of-3
http://www.calpont.com/about/blog/view/17-a-behind-the-scenes-look-at-infinidb-part-2-of-3
http://www.calpont.com/about/blog/view/41-a-behind-the-scenes-look-at-infinidb-ease-of-use-part-3-of-3
http://www.calpont.com/warner-music-group-case-study-video
http://www.information-management.com/issues/21_3/calpont-infinidb-10020717-1.html
http://www.business-intelligence.net/blog/calpont-infinidb-becomes-strategic-component-of-the-skysql-reference-architecture
http://rpbouman.blogspot.com/2009/10/calpont-opens-up-infinidb-open-source.html
http://survivalguides.wordpress.com/category/it-survival/linux/infinidb/
http://dave-stokes.blogspot.com/2010/11/calpont-infinidb-20-and-bi-quickstarts.html
Cassandra
Lily
MongoDB Revisited
Solandra
Vertica See also: http://www.dba-oracle.com/oracle_news/news_vertica.htm
http://dlutzy.wordpress.com/2012/07/30/opentsdb/
http://eric.lubow.org/2012/architecture/pros-and-cons-of-redis-resque-and-sqs
http://net.tutsplus.com/tutorials/getting-started-with-couchdb/

NoSQL does not mean No SQL. It stands for Not Only SQL.

Next Generation Databases mostly addressing some of the points (if not all): being non-relational, distributed, open-source and horizontally scalable, schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), huge amount of data.

The term NoSQL encompass a lot of things, and may include relational / SQL databases. Oracle also have support for NoSQL.

Wide Column Store / Column Families:

Hadoop / HBase
Cassandra
Hypertable
Accumulo
Amazon SimpleDB
SciDB: Array Data Model for Scientists, paper », poster », HiScaBlog »
HPCC: from LexisNexis
Stratosphere: (research system) massive parallel & flexible execution, M/R generalization and extension.

Document Store:

MongoDB
CouchDB
RavenDB
Terrastore
RaptorDB
SisoDB

Key Value / Tuple Store:

DynamoDB
MEMBASE
Redis
GenieDB
Dynomite: Open-Source implementation of Anazon Dynamo Key-Value Store.
MemcacheDB
HamsterDB

See http://nosql-database.org/

Document store:
Clusterpoint (XML, gear toward full-text search)
Apache CouchDB (JSON)
MongoDB (binary JSON)
RavenDB (binary JSON, full ACID)
SimpleDB

Key-value store:
Apache Cassandra
Dynamo
Voldemort
Riak
LevelDB
MemcacheDB
Tarantool

Key-value in-memory cache:
memcached
Redis
http://www.readwriteweb.com/hack/2011/01/video-nosql-comparison.php

http://ravendb.net/
http://codeofrob.com/entries/ravendb---an-introduction.html
http://msdn.microsoft.com/en-us/magazine/hh547101.aspx
http://ayende.com/blog/153026/embracing-ravendb
http://ayende.com/blog/4499/why-raven-db

http://en.wikipedia.org/wiki/Protocol_Buffers
http://en.wikipedia.org/wiki/Action_Message_Format
http://en.wikipedia.org/wiki/Apache_Thrift
http://en.wikipedia.org/wiki/MessagePack
http://en.wikipedia.org/wiki/Document-oriented_database
http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

http://www.youtube.com/watch?v=LhnGarRsKnA
http://www.readwriteweb.com/hack/2011/05/why-facebook-uses-apache-hadoo.php
http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform
http://www.infoq.com/presentations/HBase-at-Facebook
http://nosql.mypopescu.com/post/3657671463/facebook-builds-hbase-based-real-time-analytics
http://nosql.mypopescu.com/post/2668434779/hbase-at-facebook-the-underlying-technology-of
http://nosql.mypopescu.com/post/1582886261/facebook-replacing-cassandra-with-hbase-in-new
http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html
http://www.facebook.com/UsingHbase
http://files.meetup.com/1350427/Optimizing_HBase_scanner_performance.pptx
http://www.facebook.com/notes/hbase-at-facebook/online-schema-changes/207754852620078
http://www.facebook.com/notes/hbase-at-facebook/improving-startup-time/194582493937314
http://www.oscon.com/oscon2011/public/schedule/detail/17982
http://people.apache.org/~nspiegelberg/HBase--FBMessagesHBasePoster.pdf
http://www.facebook.com/notes/hbase-at-facebook/hbase-schema-versioning/197541776974719
http://hbase.apache.org/book.html

There is a relational database named NoSQL. This database, by design, do not use SQL, but it is a relational database, nonetheless.

Beside from that, NoSQL is a generic terms that describe big distributed database systems.

Before venturing into the world of distributed database, keep this in mind: Relational databases has been the back-bone of banks, financial systems, big corporates, government, research institutions for a long time, and even some web 2.0 companies. With proper configuration, replication, and sharding, we can still use relational databases. Learning a technology will take some time. We may not know all there is to know about a technology the first time we read a book about it. We learn more about a technology when we keep reading more about it, or have some first hand experience (which may not be a good experience) with it. The point is "don't go blindly with a technology". Consider your skill set / resources. If we really need to use distributed databases, hire someone who have experience managing such databases. If we are interested in such technology, learn from that person so we have some experience with it before we use that technology for another project.

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.

CloudBase is a data warehouse system for Terabyte and Petabyte scale analytics. It is built on top of Map-Reduce architecture. The current code has been developed to Hadoop’s map-reduce implementation. CloudBase allows you to query flat log files using ANSI SQL. It comes with JDBC driver so you can use any JDBC database manager application (e.g Squirrel) as front end.

What is polygot persistance?:

Part of the NoSQL message is: pick the right tool for the job. One part of the system can use a different database. Another part of the system can use another different database. You do not have to pick just one.

How to pick the right database for your architecture?:

To understand why NoSQL is important, consider the use cases:

  • Frequently written, rarely read statistical data should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB
  • Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop
  • Binary assets (such as MP3 and PDFs) find a good home in a datastore that can be served directly to the browser like Amazon S3 (or should be stored on distributed file systems)
  • Transient data (web sessions, locks, or short-term stats) should be kept in a transient data store like Memcache
  • If you need to be able to replicate your data set to multiple locations, you will want the replication features of CouchDB
  • High availability applications, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of data stores like Cassandra and Riak
  • MySQL for low-volume, high-value data like billing information
  • MongoDB for high-volume, low-value data like hit counts and logs
  • Amazon SE for user-uploaded assets like photos and documents (Amazon S3 is not a database, it is a file system)
  • Memcached for temporary counters and rendered HTML
  • Key-value stores offer fast lookup of data by key
  • A document database can be a key-value store
  • A column store is a fancy key-value store
  • A graph database is a document database on steroid
  • Some document databases can handle graphs
  • Range queries can be hard
  • Complex ad-hoc queries almost impossible. Don't scale well across N nodes
  • Transactions don't scale well in a distributed system

References:
I Can't Wait for NoSQL to Die
Visual Guide to NoSQL Systems
NoSQL, Heroku, and You - Heroku
http://nosql-berlinbuzzwords2010.heroku.com/#1

What does BASE abbreviate for?

Basically Available, Soft-state, Eventually consistent.

Are all implementation of eventually consistent equal?

No. Not all implementations of eventually consistent are equal. Eventually consistent database may also elect to provide the following:

  • Causal consistency: This involves a signal being sent from between application session indicating that a change has occurred. From that point on the receiving session will always see the updated value.
  • Read your own writes: In this mode of consistency, a session that performs a change to the database will immediately see that change, even if other sessions experience a delay
  • Monotonic consistency: In this mode, a session will never see data revert to an earlier point in time. Once we read a value, we will never see an earlier value.

What is the NRW notation?

NRW notation describes at a high level how a distributed database will trade off consistency, read performance and write performance. NRW stand for:

  • N: the number of copies of each data item that the database will maintain
  • R: the number of copies that the application will access when reading the data item
  • W: the number of copies of the data item that must be written before the write can complete

When N=W, the database will always write every copy before returning the control to the client. This is more or less what traditional databases do when implementing synchronous replication. If you are more concerned about write performance, you can set W=1, and R=N. Then each read must access all copies to determine which is correct, but each write will only have to touch a single copy of the data..

Most NoSQL databases use N > W > 1: more than one write must be completed, but not all nodes need to be updated immediately. You can increase the level of consistency in roughly three stages:

  1. If R=1, the database will accept whatever value it reads first. This might be out-of-date if not all updates have propagated through the system.
  2. If R>1, the database will read more than one value and pick the most recent value
  3. If W+R > N, then a read will always retrieve the latest value, although it may be mixed with older values.

In other words, the number of copies you write and number of copies you read is high enough to guarantee that you will always have at least one copy of the latest version in your read set. This is sometimes referred to as quorum assembly.

NRW Configuration Outcome
W=N R=1 Read optimized strong consistency
W=1 R=N Write optimized strong consistency
W+R<=N Weak eventual consistency. A read may not see latest update
W+R>N Strong consistency through quorum assembly. A read will see at least one copy of the most recent update

NoSQL databases generally try hard to be as consistent as possible, even when configured for weak consistency. For example, the read repair algorithm is often implemented to improve consistency when R=1. Although the application does not wait for all the copies of a data item to be read, the database will read all known copies in the background after responding to the initial request. If the application asks for the data item again, it will therefore see the latest version.

What is the Vector Clock(s) algorithm?

The Vector Clock(s) algorithm can be used to ensure that the updates are processed in order (monotonic consistency). With vector clocks, each node maintains a change number (or event count) similar to the System Change Number used in some RDBMS. The "vector" is a list including the current node's change number as well as the change numbers that have been received from other nodes. When an update is transmitted, the vector is included with the update and the receiving node compares that vector with other vectors that have been received to determine if updates are being received out of sequence. Out of sequence updates can be held until the preceding updates appear.

Conferences:

http://www.biganalytics2012.com/
http://www.nosql-matters.org/
http://nosql2012.dataversity.net/

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License