1M rows/s from Postgres to Python

asyncpg is a new fully-featured open-source Python client library for PostgreSQL. It is built specifically for asyncio and Python 3.5 async / await. asyncpg is the fastest driver among common Python, NodeJS and Go implementations.

 .. The result is that asyncpg is, on average, at least 3x faster than psycopg2 (or aiopg). This result is remarkable, as psycopg2, written in C and optimized, is not slow at all. See the benchmarks section below for more.

2nd Quadrent: Postgres Scalability

Whatever the requirements, 2ndQuadrant will produce an effective solution that considers both factors, and is aligned with the needs of your business. With a staff of performance experts including Gregory Smith, author of the book “PostgreSQL high performance”, and scalability experts including Hannu Krosing, who engineered a database architecture while at Skype to address their unparalleled PostgreSQL scalability issues, you will be in good hands.

On Uber’s Choice of Databases

if you basically need a key/value store and occasionally want to run a simple SQL query, MySQL (or MariaDB) is a reasonable choice. I guess it is at least a better choice than any random NoSQL key/value store that just started offering an even more limited SQL-ish query language. Uber, on the other hand just builds their own thing (“Schemaless”) on top of InnoDB and MySQL.

.. By now, I believe Uber did not replace PostgreSQL by MySQL as their article suggests. It seems that they actually replaced PostgreSQL by their tailor-made solution, which happens to be backed by MySQL/InnoDB (at the moment).

It seems that the article just explains why MySQL/InnoDB is a better backend for Schemaless than PostgreSQL. For those of you using Schemaless, take their advice! Unfortunately, the article doesn’t make this very clear because it doesn’t mention how their requirements changed with the introduction of Schemaless compared to 2013, when they migrated from MySQL to PostgreSQL.

Why Uber Engineering Switched from Postgres to MySQL

When we updated the birth year for al-Khwārizmī, we didn’t actually change his primary key, nor did we change his first and last name. However, these indexes still must be updated with the creation of a new row tuple in the database for the row record. For tables with a large number of secondary indexes, these superfluous steps can cause enormous inefficiencies. For instance, if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes to reflect the ctid for the new row.

.. This write amplification issue naturally translates into the replication layer as well

.. In cases where Postgres replication happens purely within a single data center, the replication bandwidth may not be a problem.

.. However, the verbosity of the Postgres replication protocol can still cause an overwhelming amount of data for a database that uses a lot of indexes.