LAMP is a great software stack to get started with in building a reliable and efficient web application, but there are many instances when a program’s complexity requires a different database technology than MySQL—or RDBMS in general, to handle information storage and retrieval. This is increasingly the case as firms find new ways to leverage the Internet and associated technologies in building novel applications that solve industry and domain-specific problems and challenges. For example, mapping and GIS-related applications require complex information storage, processing, and retrieval for geospacial analysis, using data models that are multi-level and more complicated in nature. Additionally, more sophisticated technologies are required to process the massive data explosion that has occurred within the last 5-10 years due to mobile devices, social media, and the exponential growth of the Internet —all of these contributing to the rise of Big Data, or data sets so massive that traditional RDBMS tools have difficulty processing them.
RDBMS technology is over four decades old, with the first commercial RDBMS released by RSI (later to become Oracle Corporation) back in 1979. The original developers of the RDBMS did not anticipate Big Data, and designed it to store all information into a neatly structured, tabular data model that made for streamlined storage and retrieval. This worked for a time, but in this day and age, the nature of data requires the scaling out of data, as opposed to scaling up. For example, instead of increasing the processing power, memory, and storage capacity of one database server, it makes more sense to purchase several inexpensive servers and distribute the load across them as required. This may sound familiar, as this is one of the main tenets of Cloud Computing. NoSQL fits into the cloud schema perfectly, as these two are both contemporary technologies: designed in the same era and solving the same problems with horizontal scaling.
Comparing RDBMSs and NoSQL Databases
It’s worth noting that NoSQL stands for “Not Only SQL,” not “No SQL.” That is to say, SQL—or Structured Query Language, is not entirely absent from NoSQL databases. SQL’s presence varies from vendor to vendor. Indeed, some NoSQL solutions allow the use of SQL-like query languages to manipulate data, with MongoDB as the prime example of this. Rather, the main differences between the two database technologies are related to how NoSQL solves scaling issues inherent with a RDBMS.
We mentioned before that RDBMSs require a fixed, predetermined, tabular data model to perform optimally. NoSQL databases are the opposite of this—they are schema-less, and thus more flexible. Data can be inserted without conforming to a specific structure; the format can thus be changed without modifying or rewriting the application. NoSQL databases store data as key-value pairs, columnar data, or documents—rather than a relational set of tables (and constituent rows/columns).
Different Database Chemistry: ACID vs. BASE
When comparing a RDBMS and a NoSQL database, one may encounter the terms ACID and BASE, and the relative benefits of each. ACID stands for “atomicity, consistency, isolation and durability,” and refers to the standard attributes of a RDBMS, whereas BASE stands for “basic availability, soft-state, eventual consistency.” The attributes detailed in ACID were initially stated as being imperative to optimal database operations, but as systems become larger and more distributed, this is less of a reality—especially with the immense volumes of data and simultaneous transactions that are typical of today’s large systems (think Amazon or Google search). BASE, therefore, describes the attributes of NoSQL databases: data eventually will be in sync, and this is good enough—it may be approximate and not exactly up-to-date, but at least the system is highly-available, performing and responding to requests.
In short, due to their simpler data models and distributed nature, NoSQL databases are generally faster and more scalable than RDBMSs. That being said—NoSQL is still a developing technology, while RDBMSs are mature, stable, and reliable under most circumstances. Realistically, most organizations and firms will not scale out of the realm of a standard RDBMS’ performance threshold. It is therefore not a question of which database technology is superior to the other, but rather, which is appropriate for the scenario at hand. RDBSs such as MySQL, PostgreSQL, Oracle, and SQL Server are still the most widely supported and understood, and will hold up in the vast majority of use cases. NoSQL databases such as MongoDB, CouchDB, BigTable and Apache Cassandra should be used when highly-available, distributed database systems are necessary to handle massive volumes of requests and data processing.