Amazon Athena

Serverless, no ETL
Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses. Just point to your data in Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows you to tap into all your data in S3 without the need to set up complex processes to extract, transform, and load the data (ETL).

Big Tech’s Harvest of Sorrow?

At the same time that science and technology have vastly improved human lives, they have also given certain visionaries the means to transform entire societies from above. Ominously, what was true of Soviet central planners is true of Big Tech today: namely, the assumption that society can be improved through pure “rationality.”

CAMBRIDGE – Digital technology has transformed how we communicate, commute, shop, learn, and entertain ourselves. Soon enough, technologies such as artificial intelligence (AI), Big Data, and the Internet of Things (IoT), could remake health care, energy, transportation, agriculture, the public sector, the natural environment, and even our minds and bodies.

Applying science to social problems has brought huge dividends in the past. Long before the invention of the silicon chip, medical and technological innovations had already made our lives far more comfortable – and longer. But history is also replete with disasters caused by the power of science and the zeal to improve the human condition.

For example, efforts to boost agricultural yields through scientific or technological augmentation in the context of collectivization in the Soviet Union or Tanzania backfired spectacularly. Sometimes, plans to remake cities through modern urban planning all but destroyed them. The political scientist James Scott has dubbed such efforts to transform others’ lives through science instances of “high modernism.”

An ideology as dangerous as it is dogmatically overconfident, high modernism refuses to recognize that many human practices and behaviors have an inherent logic that is adapted to the complex environment in which they have evolved. When high modernists dismiss such practices in order to institute a more scientific and rational approach, they almost always fail.

Frontier technologies such as AI, Big Data, and IoT are often presented as panaceas for optimizing work, recreation, communication, and health care. The conceit is that we have little to learn from ordinary people and the adaptations they have developed within different social contexts.

The problem is that an unconditional belief that “AI can do everything better,” to take one example, creates a power imbalance between those developing AI technologies and those whose lives will be transformed by them. The latter essentially have no say in how these applications will be designed and deployed.

The current problems afflicting social media are a perfect example of what can happen when uniform rules are imposed with no regard for social context and evolved behaviors. The rich and variegated patterns of communication that exist off-line have been replaced by scripted, standardized, and limited modes of communication on platforms such as Facebook and Twitter. As a result, the nuances of face-to-face communication, and of news mediated by trusted outlets, have been obliterated. Efforts to “connect the world” with technology have created a morass of propaganda, disinformation, hate speech, and bullying.

But this characteristically high-modernist path is not preordained. Instead of ignoring social context, those developing new technologies could actually learn something from the experiences and concerns of real people. The technologies themselves could be adaptive rather than hubristic, designed to empower society rather than silence it.

Two forces are likely to push new technologies in this direction. The first is the market, which may act as a barrier against misguided top-down schemes. Once Soviet planners decided to collectivize agriculture, Ukrainian villagers could do little to stop them. Mass starvation ensued. Not so with today’s digital technologies, the success of which will depend on decisions made by billions of consumers and millions of businesses around the world (with the possible exception of those in China).

That said, the power of the market constraint should not be exaggerated. There is no guarantee that the market will select the right technologies for widespread adoption, nor will it internalize the negative effects of some new applications. The fact that Facebook exists and collects information about its 2.5 billion active users in a market environment does not mean we can trust how it will use that data. The market certainly doesn’t guarantee that there won’t be unforeseen consequences from Facebook’s business model and underlying technologies.

For the market constraint to work, it must be bolstered by a second, more powerful check: democratic politics. Every state has a proper role to play in regulating economic activity and the use and spread of new technologies. Democratic politics often drives the demand for such regulation. It is also the best defense against the capture of state policies by rent-seeking businesses attempting to raise their market shares or profits.

Democracy also provides the best mechanism for airing diverse viewpoints and organizing resistance to costly or dangerous high-modernist schemes. By speaking out, we can slow down or even prevent the most pernicious applications of surveillance, monitoring, and digital manipulation. A democratic voice is precisely what was denied to Ukrainian and Tanzanian villagers confronted with collectivization schemes.

But regular elections are not sufficient to prevent Big Tech from creating a high-modernist nightmare. Insofar as new technologies can thwart free speech and political compromise and deepen concentrations of power in government or the private sector, they can frustrate the workings of democratic politics itself, creating a vicious circle. If the tech world chooses the high-modernist path, it may ultimately damage our only reliable defense against its hubris: democratic oversight of how new technologies are developed and deployed. We as consumers, workers, and citizens should all be more cognizant of the threat, for we are the only ones who can stop it.

Historically, high-modernist schemes have been most damaging in the hands of an authoritarian state seeking to transform a prostrate, weak society. In the case of Soviet collectivization, state authoritarianism originated from the self-proclaimed “leading role” of the Communist Party, and pursued its schemes in the absence of any organizations that could effectively resist them or provide protection to peasants crushed by them.

Yet authoritarianism is not solely the preserve of states. It can also originate from any claim to unbridled superior knowledge or ability. Consider contemporary efforts by corporations, entrepreneurs, and others who want to improve our world through digital technologies. Recent innovations have vastly increased productivity in manufacturing, improved communication, and enriched the lives of billions of people. But they could easily devolve into a high-modernist fiasco.

Loading Terabytes of Data From Postgres Into BigQuery

To load data into BigQuery we’re going to use BigQuery CLI, which is a very versatile tool. You can install it using these instructions. As we’re on Linux, we’ll be using bash script in order to perform all the work. I assume BigQuery CLI is installed and authorized.

Let’s create bigquery-upload.sh and add the following function in order to upload a single day from a specific table:

#!/bin/bash
function upload_day {
table=$1
sel=$2
day=$3
next_day=$(date -d "$day+1 days" +%Y-%m-%d)
bq_suffix=$(date -d "$day" +%Y%m%d)
echo "Uploading $table: $day..."
psql -c "\\copy (select $sel from $table where created_at >= '$day' and created_at < '$next_day') TO '$table-$day.csv' WITH CSV HEADER"
gzip $table-$day.csv
bq load --allow_quoted_newlines --project_id --replace --source_format=CSV --autodetect --max_bad_records 100 .$table$bq_suffix $table-$day.csv.gz
rm $table-$day.csv.gz
};

This function has three arguments: table, columns for selection, and date to upload. As you can see, it uses the \copy operation to download a CSV from Postgres and then compresses it. The BigQuery docs say the loading of a compressed CSV is slower than uncompressed, but uploading uncompressed data almost always seems slower.

You can call this function by simply adding a line at the end of the script:

upload_day ‘transactions’ ‘*’ ‘2018-03-01’

On Hold for 45 Minutes? It Might Be Your Secret Customer Score

Retailers, wireless carriers and others crunch data to determine what shoppers are worth for the long term—and how well to treat them

Two people call customer service at the same time to complain about the same thing. One waits a few seconds before a representative gets on the line. The other stays on hold. Why the difference?

There’s a good chance it has something to do with a rating known as a customer lifetime value, or CLV. That secret number is used by all manner of companies to measure the potential financial value of their customers.

Your score can determine the prices you pay, the products and ads you see and the perks you receive.

.. “There’s no free lunch,” says Sunil Gupta, a marketing professor at Harvard Business School who has researched models for calculating lifetime value. “The more profitable you are, the better service you will get.”

.. Everyone with a bank account, cellphone or online shopping habit has at least one CLV score, more likely several. And most people have no inkling they even exist, let alone how they are used, what goes into them or how accurate they are. Unlike credit scores, CLVs aren’t available to consumers and aren’t monitored by any government agency.

.. “Not all customers deserve a company’s best efforts,” says Peter Fader, a marketing professor at the University of Pennsylvania’s Wharton School who helped popularize lifetime value scores. His scoring method is based on transaction history, which he says is all companies need to determine how customers will behave in the future.
.. Some companies deduct points from shoppers who exhibit costly behaviors. Banks sometimes take into account the calls people make to customer-service agents or the number of times they visit branches. Online retailers track shoppers who buy things only when they are deeply discounted. People expected to cost more than they spend can have a negative score.
.. At some carriers, high-value customers who are at risk of switching to another carrier are prioritized and get routed to a top-rated call center.
.. his e-commerce clients use scores, including CLV, to respond to email inquiries. “If you’ve got an angry shopper with a high lifetime value, you might want to bump up the priority,” he says.
.. Shoppers with higher scores, however, won’t necessarily get the best deals all the time, says Jerry Jao, chief executive of Retention Science, which has worked for companies such as Target Corp. and Procter & Gamble Co. Retailers sometimes withhold discounts to high-value customers until they are at risk of losing them. “Why waste a 25% offer when the person is going to buy anyway?”
.. The scoring helps dealerships weed out costly customers. “This is what you call grinders—people who visit 16 stores to get the absolute lowest price,” he explains.
.. his firm develops scores by crunching data on things such as previous car purchases, whether a household has a teenager, where else a person has shopped and ZIP Codes, which can be used as a proxy for income. Someone who has a Neiman Marcus credit card is going to be more valuable for a car dealership than someone with a credit card from a discount chain
.. The company tracks, for example, the number of times a person calls to complain over the prior 90 days, which can affect the CLV.An airline can compare how often a shopper complains with his or her lifetime value and customer experience score, which measures inconveniences such as number of times in the middle seat, flight delays and lost bags.

“A high-value customer who had a real service disruption and never calls to complain should be compensated more quickly than someone who is complaining and costing time and money,” Mr. Srinivasan says.