database

Which database should I use for a killer web application: MongoDB, PostgreSQL, or MySQL?

Fan Out On Write
… is best if the timeline most resembles Twitter, a roughly permanent, append-only, per-user aggregation of a set of source feeds. When an incoming event is recorded, it is stored in the source feed and then a record containing a copy is created for each timeline that it must be visible in. Each timeline is essentially a materialized view of every event in the end user has visibility into.

.. The on read strategy is best if the timeline most resembles Facebook’s news feed, in that it is temporal and/or you want dynamic features like relevance (the “Top News” feed on Facebook) or event aggregation. When an event is recorded, it is stored only in the source feed. Every time an end user requests their individual timeline, the system must read all of the source feeds that the end user has visibility into, and aggregate these feeds together.

If you’re doing fan-out-on-read, you can’t really ever afford to go to disk, even if they’re SSDs. All of the timeline data must be in memory, all the time. This means you’ll probably have to monitor the size of the data set and actively purge the oldest data as the feed data grows beyond a safe threshold.

.. There is however an advanced use case that PostgreSQL isn’t that good for: graphs.

.. MongoDB is not a graph database, or even a relational database. Like CouchDB, it’s a document database and it represents the other end of the scale. What if you want to store all details of a movie in one place and you aren’t interested in being able to cross-reference the data? The relational databases want you to “normalise” your data by storing every little detail in a separate table, but you find that annoying. You just want to store your data in one place without having to think it through too much. You’re still figuring out what data you want to store, so for now you just want to dump it somewhere and worry about querying it later. That’s exactly what document databases are for and MongoDB is king of that hill.

.. Just use MySQL or Postgres. Why? If your entire _active set_ fits in a single machine’s main memory (which can be as high as 128GB+ with modern commodity machines) you don’t have a horizontal scalability problem: i.e., there is absolutely no reason for you to partition (“shard”) your database and give up relations. If your active data set fits in memory most any properly tuned database with an index will perform well enough to saturate your Ethernet card before the database itself becomes a limitation.

.. MySQL’s InnoDB (or Postgres’s storage engine), however, will still allow you maintain (depending on your request distribution) a ~2:1-5:1 data to memory ratio with a spinning disk. Once you’ve gone beyond that, performance begins to fall of rapidly (as you’re making multiple disk seeks for every request). Now, your best course of action is just to upgrade to SSDs (solid state drives), which — again — allow you to saturate your Ethernet card *before* the database becomes a limitation.

.. the non-relational model has palpable advantages for the typical webapp and opens new opportunities when occupying the position in the stack where mysql used to sit. The reason for this is the dynamic nature of webapp development. Features often change with iterations on user feedback. These new and different features often have different persistency requirements that need to be fit into the existing DB schema.

MongoDB is perfectly suited for that. In some ways, it is the best drop in non-relational replacement for a relational DB like MySQL in that it puts the most effort into making the transition smooth. What you get is freedom to change you data schema on the fly – there are no named columns with fixed datatypes. You can essentially put any “stuff” into a document, and documents within a collection (the mongo equivalent of a table) don’t need to have consistent representations across. One document may have many keys where another only has a few.

.. If data is not like logging and can be updated randomly, MongoDB is a diaster.

.. If you’re just starting out, MongoDb would be helpful for things like log files. The nature of being flexible on your documents means you can change fields on the fly and not have to worry about 4 day alter table statements on a 1 billion row log table as you’d have with MySQL. It’s also handy if you have known data structures that can be represented in a single document (note: the 4mb limit of a document means you shouldn’t ever add things like comments or any expanding data set to your document as you WILL fail on the next insert over 4mb)

Upserts for counters are useful too

.. I think you should hold off optimizing your website to use a nosql database until you actually discover a performance problem. Your database will have to be getting a huge number of writes before a relational database is a problem. Any number of reads can be handled using replication.

Breaking Down Data Silos with Foreign Data Wrappers

At the same time, developers are also facing added pressure to speed application delivery and respond more directly to line of business needs. This has given rise to ‘shadow IT’ efforts – a term used to describe IT systems and solutions built or used within an organization without the involvement of IT leadership

.. Integrating Foreign Data Types in Postgres

Postgres has a solution for this dilemma: a feature called a Foreign Data Wrapper(FDW), the first of which was developed four years ago by PostgreSQL community leader Dave Page and based on the SQL standard SQL/MED (SQL Management of External Data). FDWs provide a SQL interface for accessing remote and large data objects in remote data stores, enabling DBAs to integrate data from disparate sources and bring them into a common model under the Postgres database.

Through this, DBAs can access and manipulate data being managed by other systems as if it were within a native Postgres table. For example, using the FDW for MongoDB, database administrators can query the data from the document database and join it with data from the local Postgres table using SQL. Using this method, users can view, sort and group the data as rows and columns or as JSON documents. They can even write (insert, update or delete) data directly in the source document database from Postgres as if were a single seamless deployment. The same could be done with a Hadoop cluster or MySQL deployment. FDWs allow Postgres to function as a central federated database, or ‘hub,’ for enterprises

.. According to Gartner, “By 2017, 50% of data stored in NoSQL DBMSs will be damaging to the business due to a lack of applied information governance policies and programs.

Modern SQL in PostgreSQL [and other databases]

“SQL has gone out of fashion lately—partly due to the NoSQL movement, but mostly because SQL is often still used like 20 years ago. As a matter of fact, the SQL standard continued to evolve during the past decades resulting in the current release of 2011. In this session, we will go through the most important additions since the widely known SQL-92, explain how they work and how PostgreSQL supports and extends them. We will cover common table expressions and window functions in detail and have a very short look at the temporal features of SQL:2011 and the related features of PostgreSQL.”

Why You Should Never Use MongoDB

For quite a few years now, the received wisdom has been that social data is not relational, and that if you store it in a relational database, you’re doing it wrong.

.. When Diaspora decided to store social data in MongoDB, we were conflating a database with a cache. Databases and caches are very different things. They have very different ideas about permanence, transience, duplication, references, data integrity, and speed.

.. MongoDB’s ideal use case is even narrower than our television data. The only thing it’s good at is storing arbitrary pieces of JSON. “Arbitrary,” in this context, means that you don’t care at all what’s inside that JSON. You don’t even look. There is no schema, not even an implicit schema, as there was in our TV show data. Each document is just a blob whose interior you make absolutely no assumptions about.