The easiest way to understand CAP is to think of two nodes on opposite sides of a partition. Allowing at least one node to update state will cause the nodes to become inconsistent, thus forfeiting C. Likewise, if the choice is to preserve consistency, one side of the partition must act as if it is unavailable, thus forfeiting A. Only when nodes communicate is it possible to preserve both consistency and availability, thereby forfeiting P. The general belief is that for wide-area systems, designers cannot forfeit P and therefore have a difficult choice between C and A. In some sense, the NoSQL movement is about creating choices that focus on availability first and consistency second; databases that adhere to ACID properties (atomicity, consistency, isolation, and durability) do the opposite. The “ACID, BASE, and CAP” sidebar explains this difference in more detail.
.. Finally, all three properties are more continuous than binary. Availability is obviously continuous from 0 to 100 percent, but there are also many levels of consistency, and even partitions have nuances, including disagreement within the system about whether a partition exists.
Explain Extended: SQL Explained
This series of articles is inspired by multiple questions asked by the site visitors and Stack Overflow users, including Tony, Philip, Rexem and others.
Which method (NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL) is best to select values present in one table but missing in another one?
This:
1.
SELECT
l.*
2.
FROM
t_left l
3.
LEFT
JOIN
4.
t_right r
5.
ON
r.value = l.value
6.
WHERE
r.value
IS
NULL
, this:
1.
SELECT
l.*
2.
FROM
t_left l
3.
WHERE
l.value
NOT
IN
4.
(
5.
SELECT
value
6.
FROM
t_right r
7.
)
or this:
1.
SELECT
l.*
2.
FROM
t_left l
3.
WHERE
NOT
EXISTS
4.
(
5.
SELECT
NULL
6.
FROM
t_right r
7.
WHERE
r.value = l.value
8.
)
Analyzing S3 and CloudFront Access Logs with AWS RedShift
Log data is an interesting case for RedShift. In our environment as mentioned previously we have so much log data from our CloudFront and S3 usage that nobody could conceivably work with those datasets using standard text tools such as grep or tail. Many people load their access logs into databases, but we have not found this to be feasible using MySQL or PostgreSQL due to the fact that ad-hoc queries run against sets with billions of rows can take hours. Once imported into RedShift the same queries take minutes at the most.
.. For our simple example though, we’ll just load one month of logs from just one of our CloudFront distributions:
<span class="k">COPY</span> <span class="n">cf_logentries</span> <span class="k">FROM</span> <span class="s1">'s3://cloudfront-logs/E1DHT7QI9H0ZOB.2014-04-'</span> <span class="n">CREDENTIALS</span> <span class="s1">'aws_access_key_id=;aws_secret_access_key='</span> <span class="k">DELIMITER</span> <span class="s1">'\t'</span> <span class="n">MAXERROR</span> <span class="mi">200</span> <span class="n">FILLRECORD</span> <span class="n">IGNOREHEADER</span> <span class="mi">2</span> <span class="n">gzip</span><span class="p">; .. With CloudFront you really should care about your cache hit ratio - maybe it's obvious, but the load on your origin systems decrease as your content becomes easier to cache. This query will look at the most used URLs and give you a cache hit ratio:</span>
AWS RDS Provisioned IOPS really worth it?
If you’re ok with running replicas, we recommend running a read-only replica as a NON-RDS instance, and putting it on a regular EC2 instance. You can get better read-IOPS at a much cheaper price by managing the replica yourself. We even setup replicas outside AWS using stunnel and put SSD drives as the primary block device and we get ridiculous read speeds for our reporting systems – literally 100 times faster than we get from RDS.