Spark Streaming Programming Guide

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis or TCP sockets can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards.

Varnish Logging Examples

Log failing (client side) requests:

  • varnishlog -c -m TxStatus:^50

Log only POST requests:

  • varnishlog -c -m RxRequest:POST

Log only the User-Agent string:

  • varnishlog -c -i RxHeader -I User-Agent

Log only requests where Varnish allows the browser to send some cookie through to the backend:

  • varnishlog -b -i TxURL,TxHeader -o TxHeader Cookie

Log the entire request made to the homepage by people using Mozilla/5.0 as User-Agent. (based on this:  http://err.no/personal/blog/2008/Dec/17#2008-12-17-10-14_poor_mans_filtering_language)

  • varnishlog -o -c | perl -ne 'BEGIN { $/ = "";} print if (/RxURL.*\/$/m and /RxHeader.*Mozilla\/5.0/);'

Log all requests that take more than 10 seconds to generate:

  • varnishlog -o -i Backend,RxURL,ReqEnd,RxHeader? | perl -ne ‘BEGIN { $/= “”;} print if (/ReqEnd(?:[\sc]+)\d+\s\d+.\d+\s+\d+.\d+\s+\d+.\d+\s+(\d+.\d+)/ and $1 > 10.0)’

Logging in Varnish 4.0

The weakness of logging everything is that so much information is available that the administrator can sometimes be overwhelmed by all the information presented. It’s a figurative firehose of information and drinking from it can be painfull.

Martin has reimplemented a new logging framework in Varnish Cache 4.0. Out of all the new stuff in Varnish Cache 4.0 this might be the most significant one. It’s also the most complex feature, requiring quite a bit of time to fully understand how it works.