performance

Add “Faux” Predicates: Postgres Optimization

Made by Developers and Non-Developers

The query fetches sales that were modified before 2019. There is no index on this field, so the optimizer generates an execution plan to scan the entire table.

Let’s say you have another field in this table with the time the sale was created. Since it’s not possible for a sale to be modified before it was created, adding a similar condition on the created field won’t change the result of the query. However, the optimizer might use this information to generate a better execution plan:
db=# (
 SELECT *
 FROM sale
 WHERE modified < '2019-01-01 asia/tel_aviv'
 AND created < '2019-01-01 asia/tel_aviv';
);
 QUERY PLAN
--------------------------------------------------------------------------------------------------
Index Scan using sale_created_ix on sale (cost=0.44..4.52 rows=1 width=276)
 Index Cond: (created < '2019-01-01 00:00:00+02'::timestamp with time zone)
 Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)
After we added the “Faux Predicate” the optimizer decided to use the index on the created field, and the query got much faster! Note that the previous predicate on the modified field is still being evaluated, but it’s now being applied on much fewer rows.

A “Faux Predicate” should not change the result of the query. It should only be used to provide more information to the optimizer that can improve the query performance. Keep in mind that the database has to evaluate all the predicates, so adding too many might make a query slower.

Best Practices for Optimizing Postgres Query Performance

Over the last 5 years, we’ve learned a lot on how to optimize Postgres performance. In this eBook, we wrote down our key learnings on how to get the most out of your database.

Have you ever received questions from your team asking why your product’s application is running slowly? Most probably you have. But did you ever consider whether actually your database was at fault for the issue?

In our experience: Database Performance = Application Performance.

Fastest Way to Load Data Into PostgreSQL Using Python

From two minutes to less than half a second!

Data written to an unlogged table will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that UNLOGGED tables will not be restored in case of a crash, and will not be replicated.

.. Copy Data From a String Iterator with Buffer Size▶

In an attempt to squeeze one final drop of performance, we notice that just like page_size, the copy command also accepts a similar argument called size:

size – size of the buffer used to read from the file.

Let’s add a size argument to the function:

@profile
def copy_string_iterator(connection, beers: Iterator[Dict[str, Any]], size: int = 8192) -> None:
    with connection.cursor() as cursor:
        create_staging_table(cursor)
        beers_string_iterator = StringIteratorIO((
            '|'.join(map(clean_csv_value, (
                beer['id'],
                beer['name'],
                beer['tagline'],
                parse_first_brewed(beer['first_brewed']).isoformat(),
                beer['description'],
                beer['image_url'],
                beer['abv'],
                beer['ibu'],
                beer['target_fg'],
                beer['target_og'],
                beer['ebc'],
                beer['srm'],
                beer['ph'],
                beer['attenuation_level'],
                beer['brewers_tips'],
                beer['contributed_by'],
                beer['volume']['value'],
            ))) + '\n'
            for beer in beers
        ))
        cursor.copy_from(beers_string_iterator, 'beers', sep='|', size=size)

The default value for size is 8192, which is 2 ** 13, so we will keep sizes in powers of 2:

>>> copy_string_iterator(connection, iter(beers), size=1024)
copy_string_iterator(size=1024)
Time   0.4536
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=8192)
copy_string_iterator(size=8192)
Time   0.4596
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=16384)
copy_string_iterator(size=16384)
Time   0.4649
Memory 0.0

>>> copy_string_iterator(connection, iter(beers), size=65536)
copy_string_iterator(size=65536)
Time   0.6171
Memory 0.0

BEATING UBER WITH A POSTGRESQL PROTOTYPE

Our version is around 40 times faster than the Uber one. Clearly, Uber should consider using PostgreSQL instead of custom code. Given the fact that we invested around 30 minutes to get this done, even developing the business logic is faster with PostgreSQL.