Fastest Way to Load Data Into PostgreSQL Using Python
From two minutes to less than half a second!
Data written to an unlogged table will not be logged to the write-ahead-log (WAL), making it ideal for intermediate tables. Note that
<span style="color: #010101;">UNLOGGED</span>
tables will not be restored in case of a crash, and will not be replicated... Copy Data From a String Iterator with Buffer Size▶
In an attempt to squeeze one final drop of performance, we notice that just like
<span style="color: #010101;">page_size</span>
, the<span style="color: #010101;">copy</span>
command also accepts a similar argument called<span style="color: #010101;">size</span>
:size – size of the buffer used to read from the file.
Let’s add a
<span style="color: #010101;">size</span>
argument to the function:@profile def copy_string_iterator(connection, beers: Iterator[Dict[str, Any]], size: int = 8192) -> None: with connection.cursor() as cursor: create_staging_table(cursor) beers_string_iterator = StringIteratorIO(( '|'.join(map(clean_csv_value, ( beer['id'], beer['name'], beer['tagline'], parse_first_brewed(beer['first_brewed']).isoformat(), beer['description'], beer['image_url'], beer['abv'], beer['ibu'], beer['target_fg'], beer['target_og'], beer['ebc'], beer['srm'], beer['ph'], beer['attenuation_level'], beer['brewers_tips'], beer['contributed_by'], beer['volume']['value'], ))) + '\n' for beer in beers )) cursor.copy_from(beers_string_iterator, 'beers', sep='|', size=size)The default value for size is 8192, which is
<span style="color: #010101;">2 ** 13</span>
, so we will keep sizes in powers of 2:>>> copy_string_iterator(connection, iter(beers), size=1024) copy_string_iterator(size=1024) Time 0.4536 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=8192) copy_string_iterator(size=8192) Time 0.4596 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=16384) copy_string_iterator(size=16384) Time 0.4649 Memory 0.0 >>> copy_string_iterator(connection, iter(beers), size=65536) copy_string_iterator(size=65536) Time 0.6171 Memory 0.0