Postgres TOAST

TOAST is a mechanism PostgreSQL uses to keep physical data rows from exceeding the size of a data block (typically 8KB). Postgres does not support physical rows that cross block boundaries, so the block size is a hard upper limit on row size. To allow user tables to have rows wider than this, the TOAST mechanism breaks up wide field values into smaller pieces, which are stored “out of line” in a TOAST table associated with the user table.

Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user, and enabled by default.

When a row that is to be stored is “too wide” (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide field values. If that isn’t enough to get the row under 2KB, it breaks up the wide field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this “out of line” data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that’s good enough and the row can be stored successfully.

All standard Postgres data types that could possibly have values wider than 2KB support being “TOASTed” in this way, and so do most potentially-wide extension data types.

You can view the current TOAST options for a table by opening psql and running

\d+ table_name
for example:

=> \d+ test_table_name
Table “name_space.test_table_name”
Column | Type | Modifiers | Storage | Stats target | Description
——–+——————-+———–+———-+————–+————-
foo | character(100000) | | extended | |
You can modify the storage type like this:

alter table test_blob alter column column_name set storage EXTENDED;

The default TOAST option is EXTENDED, which tells the system to first attempt to compress the data for that column. If that isn’t enough, then it out-of-line’s it to the TOAST table. There are other options that you can pick if you don’t want compression or out-of-line storage. Be careful about disabling out-of-line storage entirely, because that will cause an attempt to store an oversize row to fail altogether. (Note that columns that are of non-TOASTable data types will show the storage option as “plain”, and you can’t change it.)

Fastest Way to Load Data Into PostgreSQL Using Python

From two minutes to less than half a second!

Data written to an unlogged table will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that UNLOGGED tables will not be restored in case of a crash, and will not be replicated.

.. Copy Data From a String Iterator with Buffer Size

In an attempt to squeeze one final drop of performance, we notice that just like page_size, the copy command also accepts a similar argument called size:

size – size of the buffer used to read from the file.

Let’s add a size argument to the function:

@profile
def copy_string_iterator(connection, beers: Iterator[Dict[str, Any]], size: int = 8192) -> None:
    with connection.cursor() as cursor:
        create_staging_table(cursor)
        beers_string_iterator = StringIteratorIO((
            '|'.join(map(clean_csv_value, (
                beer['id'],
                beer['name'],
                beer['tagline'],
                parse_first_brewed(beer['first_brewed']).isoformat(),
                beer['description'],
                beer['image_url'],
                beer['abv'],
                beer['ibu'],
                beer['target_fg'],
                beer['target_og'],
                beer['ebc'],
                beer['srm'],
                beer['ph'],
                beer['attenuation_level'],
                beer['brewers_tips'],
                beer['contributed_by'],
                beer['volume']['value'],
            ))) + '\n'
            for beer in beers
        ))
        cursor.copy_from(beers_string_iterator, 'beers', sep='|', size=size)

The default value for size is 8192, which is 2 ** 13, so we will keep sizes in powers of 2:

>>> copy_string_iterator(connection, iter(beers), size=1024)
copy_string_iterator(size=1024)
Time   0.4536
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=8192)
copy_string_iterator(size=8192)
Time   0.4596
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=16384)
copy_string_iterator(size=16384)
Time   0.4649
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=65536)
copy_string_iterator(size=65536)
Time   0.6171
Memory 0.0