Made by Developers and Non-Developers
The query fetches sales that were modified before 2019. There is no index on this field, so the optimizer generates an execution plan to scan the entire table.
Let’s say you have another field in this table with the time the sale was created. Since it’s not possible for a sale to be modified before it was created, adding a similar condition on the
created
field won’t change the result of the query. However, the optimizer might use this information to generate a better execution plan:db=# ( SELECT * FROM sale WHERE modified < '2019-01-01 asia/tel_aviv' AND created < '2019-01-01 asia/tel_aviv'; ); QUERY PLAN -------------------------------------------------------------------------------------------------- Index Scan using sale_created_ix on sale (cost=0.44..4.52 rows=1 width=276) Index Cond: (created < '2019-01-01 00:00:00+02'::timestamp with time zone) Filter: (modified < '2019-01-01 00:00:00+02'::timestamp with time zone)After we added the “Faux Predicate” the optimizer decided to use the index on the
created
field, and the query got much faster! Note that the previous predicate on themodified
field is still being evaluated, but it’s now being applied on much fewer rows.A “Faux Predicate” should not change the result of the query. It should only be used to provide more information to the optimizer that can improve the query performance. Keep in mind that the database has to evaluate all the predicates, so adding too many might make a query slower.
Best Practices for Optimizing Postgres Query Performance
Over the last 5 years, we’ve learned a lot on how to optimize Postgres performance. In this eBook, we wrote down our key learnings on how to get the most out of your database.
Have you ever received questions from your team asking why your product’s application is running slowly? Most probably you have. But did you ever consider whether actually your database was at fault for the issue?
In our experience: Database Performance = Application Performance.
Fastest Way to Load Data Into PostgreSQL Using Python
From two minutes to less than half a second!
Data written to an unlogged table will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that
<span style="color: #010101;">UNLOGGED</span>
tables will not be restored in case of a crash, and will not be replicated... Copy Data From a String Iterator with Buffer Size▶
In an attempt to squeeze one final drop of performance, we notice that just like
<span style="color: #010101;">page_size</span>
, the<span style="color: #010101;">copy</span>
command also accepts a similar argument called<span style="color: #010101;">size</span>
:size – size of the buffer used to read from the file.
Let’s add a
<span style="color: #010101;">size</span>
argument to the function:@profile def copy_string_iterator(connection, beers: Iterator[Dict[str, Any]], size: int = 8192) -> None: with connection.cursor() as cursor: create_staging_table(cursor) beers_string_iterator = StringIteratorIO(( '|'.join(map(clean_csv_value, ( beer['id'], beer['name'], beer['tagline'], parse_first_brewed(beer['first_brewed']).isoformat(), beer['description'], beer['image_url'], beer['abv'], beer['ibu'], beer['target_fg'], beer['target_og'], beer['ebc'], beer['srm'], beer['ph'], beer['attenuation_level'], beer['brewers_tips'], beer['contributed_by'], beer['volume']['value'], ))) + '\n' for beer in beers )) cursor.copy_from(beers_string_iterator, 'beers', sep='|', size=size)The default value for size is 8192, which is
<span style="color: #010101;">2 ** 13</span>
, so we will keep sizes in powers of 2:>>> copy_string_iterator(connection, iter(beers), size=1024) copy_string_iterator(size=1024) Time 0.4536 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=8192) copy_string_iterator(size=8192) Time 0.4596 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=16384) copy_string_iterator(size=16384) Time 0.4649 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=65536) copy_string_iterator(size=65536) Time 0.6171 Memory 0.0
BEATING UBER WITH A POSTGRESQL PROTOTYPE
Our version is around 40 times faster than the Uber one. Clearly, Uber should consider using PostgreSQL instead of custom code. Given the fact that we invested around 30 minutes to get this done, even developing the business logic is faster with PostgreSQL.