Analyzing S3 and CloudFront Access Logs with AWS RedShift

Log data is an interesting case for RedShift. In our environment as mentioned previously we have so much log data from our CloudFront and S3 usage that nobody could conceivably work with those datasets using standard text tools such as grep or tail. Many people load their access logs into databases, but we have not found this to be feasible using MySQL or PostgreSQL due to the fact that ad-hoc queries run against sets with billions of rows can take hours. Once imported into RedShift the same queries take minutes at the most.

.. For our simple example though, we’ll just load one month of logs from just one of our CloudFront distributions:

<span class="k">COPY</span> <span class="n">cf_logentries</span>
  <span class="k">FROM</span> <span class="s1">'s3://cloudfront-logs/E1DHT7QI9H0ZOB.2014-04-'</span>
  <span class="n">CREDENTIALS</span> <span class="s1">'aws_access_key_id=;aws_secret_access_key='</span>
  <span class="k">DELIMITER</span> <span class="s1">'\t'</span> <span class="n">MAXERROR</span> <span class="mi">200</span> <span class="n">FILLRECORD</span> <span class="n">IGNOREHEADER</span> <span class="mi">2</span> <span class="n">gzip</span><span class="p">;

.. With CloudFront you really should care about your cache hit ratio - maybe it's obvious, but the load on your origin systems decrease as your content becomes easier to cache. This query will look at the most used URLs and give you a cache hit ratio:</span>

 

PostgreSQL + WAL-E + Cloudfiles = Awesome

Instead of just using a simple socket to communicate, WAL-E sends these base backups and WAL files across the internet with the help of a cloud object store, like Cloudfiles (or any OpenStack Swift deployment). This gives you the advantage that, in addition to just being replication, you have a durable backup of your database for disaster recovery. Further, you have effectively infinite read scalability from the archives, you can keep adding more followers without putting more stress on the leader.