The precision — or lack thereof — of time

Exactly how long is a month? 30 days?  31? Likewise you never want to try and deal with a time without a date, because then you can’t handle daylight savings or any other timezone changes.

.. Where things really fall apart is when you try and sequence events in a database, because the data from a transaction won’t be visible to other transactions until it commits. If transaction A runs from 12:00 to 12:03 and transaction B runs from 12:01 to 12:02, which transaction happened first? You might want to say A because it started first. But A can potentially see data from B after B commits. Conversely, you could say B happened first because it commits first, but clearly some events in transaction A happened before B even started.

.. A simple example is the “midnight problem.” Say you’re tracking visitors to your website, and you want to know how many people visit per day. Someone that first hits your site at 11:59PM and stays for a while will end up counted twice if you’re not careful. Once at 11:59 and then a second time after midnight. Technically all those visitors did visit on both days; but you may not have wanted to actually double-count them.

.. Another strong variation is to actually create a cryptographic hash for each record that includes the hash of the previous record. That makes it impossible for old data to be erased or changed without detecting it.

 

 

PostgreSQL anti-patterns: Unnecessary json/hstore dynamic columns

json is the new EAV – a great tool when you need it, but not something you should use as a first choice.

.. For example, sometimes application users want to be able to add arbitrary attributes to an entity. It’s a business requirement. The client don’t like it when you say that the database doesn’t do that so they can’t have notes in their application, and they’re muttering things about “just doing it in the application” or “we didn’t have these problems with MongoDB”.

How to decide when to use json

Use json if your data won’t fit in the database using a normal relational modelling. If you’re choosing between using EAV, serializing a Java/Ruby/Python object into a bytea field, or storing a key to look up an external structured object somewhere else … that’s when you should be reaching for json fields.

 

Google goes back to the future with SQL F1 database: ‘Can you have a truly scalable database without going NoSQL? Yes!’

The AdWords system includes “100s of applications and 1000s of users,” which all share a database over 100TB serving up “hundreds of thousands of requests per second, and runs SQL queries that scan tens of trillions of data rows per day,” Google said. And it’s got five nines of availability.

..  Google describes it as a “a hybrid, combining the best aspects of traditional relational databases and scalable NoSQL systems”.

.. The technology comes with a cost, as Google said due to its design choices it resulted in “higher latency for typical reads and writes,” though the company has developed workarounds for this. It has “relatively high” commit latencies of 50-150 ms, Google writes.

.. We’ve moved this huge Adwords system onto this new DB and proved it actually works. The system is way more scalable than what it was before – better availability than MySQL, better consistency than MySQL, we’ve got SQL-query that’s just as good as what we started with.”

.. “We also have a lot of experience with eventual consistency systems at Google,” they write in the paper. “In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.”

.. Right now, companies such as Cloudera, ParAccel, MongoDB, and Cascading are all trying to layer SQL-like query engine over a datastore like HDFS or MongoDB. But though these can scale well they lack the transactional capabilities of some systems. This, for enterprises, is an unpleasant pill to swallow.

 

What’s left of NoSQL?

One possible explanation for the lost appreciation of SQL among developers is the increasing popularity of object-relational mapping tools (ORM) that generally tend to reduce SQL databases to pure storage media (“persistence layer”). The possibility to refine data using SQL is not encouraged but considerably hindered by these tools. The result of the excessive use is a step-by-step processing by the application. Under this circumstances SQL does indeed not deliver any additional value and it becomes understandable why so many developers sympathise with the term NoSQL.

 

.. But the problem is that the term NoSQL was not aimed against SQL in the first place. To make that clear the term was defined to mean “not only SQL” later on. Thus, NoSQL is about complementary alternatives. To be precise it is not even about alternatives to SQL but about alternatives to the relational model and ACID. In the meantime the CAP theorem revealed that the ACID criteria will inevitably reduce the availability of distributed databases. That means that traditional, ACID compliant, databases cannot benefit from the virtually unlimited resources available in cloud environments. This is what many NoSQL systems provide a solution for: instead of sticking to the very rigid ACID criteria to keep data 100% consistent all the time they accept temporary inconsistencies to increase the availability in a distributed environment. Simply put: in doubt they prefer to deliver wrong (old) data than no data. A more correct but less catchy term would therefore be NoACID.

Deploying such systems only makes sense for applications that don’t need strict data consistency. These are quite often applications in the social media field that can still fulfil their purpose with old data and even accept the loss of some updates in case of service interruption.

.. One might believe the success of NoSQL is also based on the fact that it solves a problem that everybody would love to have. In all reality this problem is only relevant to a very small but prominent community, which managed to get a lot of attention.