The False Flexibility of MongoDB

MongoDB has long been touted for ease of development and deployment due to its flexible json-like data model. Document schema are not predefined, thus avoiding time spent on defining a data schema that will likely undergo large changes. Likewise, the argument goes, deployments are made easier by skipping expensive schema changes.

This initial benefit, encountered generally in the prototyping stage of an application, almost always results in large headaches down the road for the following reasons:

1. An open schema does not mean you have no data schema.

Once you're past initial prototyping and starting to store input from users, any significant data model change is expensive, whether you're using SQL or NoSQL. A common data model change in Mongo is to take a few fields in a document and group them together as an embedded relation. In such case you have two choices - write a script to migrate this data (sounds an awful lot like SQL), or handle both new and old versions of the schema (the path of pain). Obviously, most choose to migrate data, of course without the benefits of database-level constraints.

2. With its lack of joins, Mongo encourages coupling data storage and access.

Imagine modeling a faux-Twitter in MongoDB. It's core data model is quite simple - you create a users collection with their tweets as an embedded relation. It initially works great - every time you view a user's profile you've already got their tweets fetched in the same query. However, your user traction is not what you hoped, so you create a retweet feature to encourage users to interact with each other. Retweets are embedded inside their parent tweet, which is of course part of the user document. On each user's profile you want to display not only their tweets but also their retweets. You now have to search through a doubly nested collection, and your whole database has come to a grinding halt.

Ironically, this basic deficiency is acknowledged if you dig into Mongo's documentation -

References provides more flexibility than embedding. However, client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.

Most data is relational. Save MongoDB for those special cases where the relational model doesn't fit.

First Post

On my new blog