Fastest Way to Load Data Into PostgreSQL Using Python

From two minutes to less than half a second!

Data written to an unlogged table will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that UNLOGGED tables will not be restored in case of a crash, and will not be replicated.

.. Copy Data From a String Iterator with Buffer Size

In an attempt to squeeze one final drop of performance, we notice that just like page_size, the copy command also accepts a similar argument called size:

size – size of the buffer used to read from the file.

Let’s add a size argument to the function:

@profile
def copy_string_iterator(connection, beers: Iterator[Dict[str, Any]], size: int = 8192) -> None:
    with connection.cursor() as cursor:
        create_staging_table(cursor)
        beers_string_iterator = StringIteratorIO((
            '|'.join(map(clean_csv_value, (
                beer['id'],
                beer['name'],
                beer['tagline'],
                parse_first_brewed(beer['first_brewed']).isoformat(),
                beer['description'],
                beer['image_url'],
                beer['abv'],
                beer['ibu'],
                beer['target_fg'],
                beer['target_og'],
                beer['ebc'],
                beer['srm'],
                beer['ph'],
                beer['attenuation_level'],
                beer['brewers_tips'],
                beer['contributed_by'],
                beer['volume']['value'],
            ))) + '\n'
            for beer in beers
        ))
        cursor.copy_from(beers_string_iterator, 'beers', sep='|', size=size)

The default value for size is 8192, which is 2 ** 13, so we will keep sizes in powers of 2:

>>> copy_string_iterator(connection, iter(beers), size=1024)
copy_string_iterator(size=1024)
Time   0.4536
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=8192)
copy_string_iterator(size=8192)
Time   0.4596
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=16384)
copy_string_iterator(size=16384)
Time   0.4649
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=65536)
copy_string_iterator(size=65536)
Time   0.6171
Memory 0.0

Pyodide: Bringing the scientific Python stack to the browser

.. Unfortunately, the “language we all have” in the browser, JavaScript, doesn’t have a mature suite of data science libraries, and it’s missing a number of features that are useful for numerical computing, such as operator overloading. We still think it’s worthwhile to work on changing that and moving the JavaScript data science ecosystem forward. In the meantime, we’re also taking a shortcut: we’re meeting data scientists where they are by bringing the popular and mature Python scientific stack to the browser.

It’s also been argued more generally that Python not running in the browser represents an existential threat to the language—with so much user interaction happening on the web or on mobile devices, it needs to work there or be left behind. Therefore, while Pyodide tries to meet the needs of Iodide first, it is engineered to be useful on its own as well.

.. After a discussion with some of Mozilla’s WebAssembly wizards, we saw that the key to building this was emscripten and WebAssembly: technologies to port existing code written in C to the browser.  That led to the discovery of an existing but dormant build of Python for emscripten, cpython-emscripten, which was ultimately used as the basis for Pyodide.

.. WebAssembly is a new language that runs in modern web-browsers, as a complement to JavaScript.  It’s a low-level assembly-like language that runs with near-native performance intended as a compilation target for low-level languages like C and C++.  Notably, the most popular interpreter for Python, called CPython, is implemented in C, so this is the kind of thing emscripten was created for.

Pyodide is put together by:

  • Downloading the source code of the mainstream Python interpreter(CPython), and the scientific computing packages (NumPy, etc.)
  • Applying a very small set of changes to make them work in the new environment
  • Compiling them to WebAssembly using emscripten’s compiler

If you were to just take this WebAssembly and load it in the browser, things would look very different to the Python interpreter than they do when running directly on top of your operating system. For example, web browsers don’t have a file system (a place to load and save files). Fortunately, emscripten provides a virtual file system, written in JavaScript, that the Python interpreter can use. By default, these virtual “files” reside in volatile memory in the browser tab, and they disappear when you navigate away from the page.  (emscripten also provides a way for the file system to store things in the browser’s persistent local storage, but Pyodide doesn’t use it.)

By emulating the file system and other features of a standard computing environment, emscripten makes moving existing projects to the web browser possible with surprisingly few changes. (Some day, we may move to using WASIas the system emulation layer, but for now emscripten is the more mature and complete option).

.. We run CPython’s unit tests as part of Pyodide’s continuous testing to get a handle on what features of Python do and don’t work.  Some things, like threading, don’t work now, but with the newly-available WebAssembly threads, we should be able to add support in the near future.

.. How fast is it?

Running the Python interpreter inside a JavaScript virtual machine adds a performance penalty, but that penalty turns out to be surprisingly small — in our benchmarks, around 1x-12x slower than native on Firefox and 1x-16x slower on Chrome. Experience shows that this is very usable for interactive exploration.

Notably, code that runs a lot of inner loops in Python tends to be slower by a larger factor than code that relies on NumPy to perform its inner loops. Below are the results of running various Pure Python and Numpy benchmarks in Firefox and Chrome compared to natively on the same hardware.

Interaction between Python and JavaScript

If all Pyodide could do is run Python code and write to standard out, it would amount to a cool trick, but it wouldn’t be a practical tool for real work.  The real power comes from its ability to interact with browser APIs and other JavaScript libraries at a very fine level. WebAssembly has been designed to easily interact with the JavaScript running in the browser.  Since we’ve compiled the Python interpreter to WebAssembly, it too has deep integration with the JavaScript side.

Pyodide implicitly converts many of the built-in data types between Python and JavaScript.  Some of these conversions are straightforward and obvious, but as always, it’s the corner cases that are interesting.

Building Good Communities: here’s what’s wrong with the internet today

Hey there,

Have you ever interacted with an online community and got a horrible reaction that made you feel like crap?

You’re not alone.

In a nutshell, here’s what’s wrong with public communities on the internet:

Image
Image

If you can’t see the screenshot, here’s what happened:

There’s a motivated fledgling developer (16 years old!) who decides to contribute back to the community by creating a series of Python video tutorials on YouTube.

He or she posts these free tutorials to Reddit…

And what kinds of supportive comments does he or she get?

Well, check it out:

~~~

“You lack CS/development experience to properly teach people. No offense but your videos don’t bring anything new. The topics of your videos have all been covered before by experienced developers. The Flask quickstart tutorial does a pretty good job of this. You will most likely end up teaching beginner’s bad practices because of this.”

~~~

Maybe these tutorials weren’t the greatest tutorials ever made.

But WHAT ON EARTH justifies this incredibly negative, berating smackdown of a response from some jerk hiding behind a pseudonym?

I mean, I get it—we software developers are a critical bunch and sometimes we get a little carried away and maybe don’t realize there’s a real person sitting at the other end.

I generally try to appreciate critical feedback because it can help me grow.

But getting smacked in the face with aggressive reactions out of nowhere feels awful, no matter what—

This kind of exchange HURTS.

And the fact that stuff like that happens on a regular basis on public communities like Reddit, Stack Overflow, GitHub etc. frustrates me to no end.

Actually, it pisses me off.

Not only out of self-pity because I’ve experienced stuff like that myself—

But for the sake of countless developers who are seeking community and want to CONTRIBUTE and then get BULLIED by some prick who had a bad day.

Can you imagine working up the courage to ask a question on a forum like that as a beginner, or sharing your first real blog post or open-source project…and then getting punched in the stomach with such a reaction?

It sucks the joy and motivation right out of you…

Now, I’m not trying to knock sites like Reddit or Stack Overflow. They provide immense value. It’s just that at the scale they operate there’s NO WAY they can keep the jerks at bay.

But even a 10:1 ratio of good vs bad interactions FEELS terrible.

You never know what reaction you’re going to get, and as a result people need to keep their guards up constantly.

It doesn’t create a safe environment for learning and long-term growth. Over time, being a member of a “community” like that becomes a net-negative for your energy and motivation.

Slowly but surely the good people leave and what remains is often a cesspool of personal attacks, unbounded negativity, and one-upmanship.

And it sucks.

Going through a similar experience led me to eventually create PythonistaCafe with a group of likeminded Python developers—

A good way to think of PythonistaCafe is to see it as a club of mutual improvement for Python enthusiasts.