AWS Public Datasets

AWS hosts a variety of public datasets that anyone can access for free.

Previously, large datasets such as satellite imagery or genomic data have required hours or days to locate, download, customize, and analyze. When data is made publicly available on AWS, anyone can analyze any volume of data without needing to download or store it themselves. These datasets can be analyzed using AWS compute and data analytics products, including Amazon EC2Amazon AthenaAWS Lambda and Amazon EMR.

How Trump Consultants Exploited the Facebook Data of Millions

The firm had secured a $15 million investment from Robert Mercer, the wealthy Republican donor, and wooed his political adviser, Stephen K. Bannon, with the promise of tools that could identify the personalities of American voters and influence their behavior. But it did not have the data to make its new products work.

So the firm harvested private information from the Facebook profiles of more than 50 million users without their permission, according to former Cambridge employees, associates and documents, making it one of the largest data leaks in the social network’s history.

The breach allowed the company to exploit the private social media activity of a huge swath of the American electorate, developing techniques that underpinned its work on President Trump’s campaign in 2016.

Christopher Wylie, who helped found Cambridge and worked there until late 2014, said of its leaders: “Rules don’t matter for them. For them, this is a war, and it’s all fair.

They want to fight a culture war in America,” he added. “Cambridge Analytica was supposed to be the arsenal of weapons to fight that culture war.”

.. Cambridge not only relied on the private Facebook data but still possesses most or all of the trove.

Cambridge paid to acquire the personal information through an outside researcher who, Facebook says, claimed to be collecting it for academic purposes.

.. suspending Cambridge Analytica, Mr. Wylie and the researcher, Aleksandr Kogan, a Russian-American academic, from Facebook.

..  the company acknowledged that it had acquired the data, though it blamed Mr. Kogan for violating Facebook’s rules and said it had deleted the information as soon as it learned of the problem two years ago.

.. In Britain, Cambridge Analytica is facing intertwined investigations by Parliament and government regulators into allegations that it performed illegal work on the “Brexit” campaign.

.. Mr. Mercer’s daughter, Rebekah, a board member, Mr. Bannon and Mr. Nix received warnings from their lawyer that it was illegal to employ foreigners in political campaigns

.. WikiLeaks founder, Julian Assange, disclosed in October that Mr. Nix had reached out to him during the campaign in hopes of obtaining private emails belonging to Mr. Trump’s Democratic opponent, Hillary Clinton.

.. The data Cambridge collected from profiles, a portion of which was viewed by The Times, included details on users’ identities, friend networks and “likes.”

.. “Protecting people’s information is at the heart of everything we do,” Mr. Grewal said. “No systems were infiltrated, and no passwords or sensitive pieces of information were stolen or hacked.

.. recruiting Mr. Wylie, then a 24-year-old political operative with ties to veterans of President Obama’s campaigns. Mr. Wylie was interested in using inherent psychological traits to affect voters’ behavior and had assembled a team of psychologists and data scientists, some of them affiliated with Cambridge University.

.. The group experimented abroad, including in the Caribbean and Africa, where privacy rules were lax or nonexistent

.. Mr. Nix and his colleagues courted Mr. Mercer, who believed a sophisticated data company could make him a kingmaker in Republican politics

.. Mr. Bannon was intrigued by the possibility of using personality profiling to shift America’s culture and rewire its politics

.. Building psychographic profiles on a national scale required data the company could not gather without huge expense. Traditional analytics firms used voting records and consumer purchase histories to try to predict political beliefs and voting behavior.

.. But those kinds of records were useless for figuring out whether a particular voter was, say, a neurotic introvert, a religious extrovert, a fair-minded liberal or a fan of the occult. Those were among the psychological traits the firm claimed would provide a uniquely powerful means of designing political messages.

.. Mr. Wylie found a solution at Cambridge University’s Psychometrics Centre. Researchers there had developed a technique to map personality traits based on what people had liked on Facebook. The researchers paid users small sums to take a personality quiz and download an app, which would scrape some private information from their profiles and those of their friends

.. When the Psychometrics Centre declined to work with the firm, Mr. Wylie found someone who would: Dr. Kogan, who was then a psychology professor at the university and knew of the techniques

.. All he divulged to Facebook, and to users in fine print, was that he was collecting information for academic purposes, the social network said.

.. Dr. Kogan declined to provide details of what happened, citing nondisclosure agreements with Facebook and Cambridge Analytica

.. He ultimately provided over 50 million raw profiles to the firm

.. Only about 270,000 users — those who participated in the survey — had consented to having their data harvested.

.. Mr. Wylie said the Facebook data was “the saving grace” that let his team deliver the models it had promised the Mercers.

.. The firm was effectively a shell.

.. But in July 2014, an American election lawyer advising the company, Laurence Levy, warned that the arrangement could violate laws limiting the involvement of foreign nationals in American elections.

.. In a BBC interview last December, Mr. Nix said that the Trump efforts drew on “legacy psychographics” built for the Cruz campaign.

.. By early 2015, Mr. Wylie and more than half his original team of about a dozen people had left the company. Most were liberal-leaning, and had grown disenchanted with working on behalf of the hard-right candidates the Mercer family favored.

.. Mr. Nix has mentioned some questionable practices. This January, in undercover footage filmed by Channel 4 News in Britain and viewed by The Times, he boasted of employing front companies and former spies on behalf of political clients around the world, and even suggested ways to entrap politicians in compromising situations.

.. Mr. Nix is seeking to take psychographics to the commercial advertising market. He has repositioned himself as a guru for the digital ad age — a “Math Man,” he puts it. In the United States last year, a former employee said, Cambridge pitched Mercedes-Benz, MetLife and the brewer AB InBev, but has not signed them on.

PostgreSQL Domain Integrity In Depth

URIs

For example, consider URIs. The naive way to store them is in a text field. One step better is to add a regex to match URI syntax, and wrap the check in a custom domain as we have done so many times in this article already.

.. To extract the pieces-to-be from a raw blob of address data you can use PostgreSQL extension helper functions. PostGIS provides a basic parser, and then there’s the pgsql-postal extension powered by libpostal. The latter is the big gun, but may consume large amounts of RAM during its operation.

Email Addresses

Email addresses are case insensitive, so it makes sense to represent them that way. Also it’s not a concern of domain integrity per se, but Web applications often add a uniqueness constraint on user emails. That’s because it doesn’t make sense to use the same email address between multiple user accounts. The constraint ought to prevent duplicates including those of case variation. One way to do this is make a unique index on a text expression, like

CREATE UNIQUE INDEX users_lower_email_key
ON users (LOWER(email));
Unfortunately it’s touchy. Any queries filtering the users table by email must remember to lowercase the prospective value. It would be better if any comparison between emails was case insensitive so that nobody has to remember to explicitly lowercase the values.

This situation is perfect for the citext (aka Case Insensitive Text) extension. It’s a type that stores text verbatim, but compares without regard to case.

Enumerations

A few RDBMSes (PostgreSQL and MySQL) have a special enum type that ensures a variable or column must be one of a certain list of values. This is also enforcible with custom domains.

CREATE TYPE log_level AS ENUM
(‘notice’, ‘warning’, ‘error’, ‘severe’);

— allows convenient queries like
SELECT * FROM log WHERE level >= ‘warning’;
This is the kind of logic you would otherwise have to implement yourself.

If you do want to use the enum type, then changing its values requires some special DDL commands.

Normally CHECK conditions in domains have limited complexity. They can’t contain subqueries, or reference other rows. However using stored procedures we can get around these restrictions.

Datasette: instantly create and publish an API for your SQLite databases

A key feature of datasette is that the API it provides is very deliberately read-only. This provides a number of interesting benefits:

 

  • It lets us use SQLite in production in high traffic scenarios. SQLite is an incredible piece of technology, but it is rarely used in web application contexts due to its limitations with respect to concurrent writes. Datasette opens SQLite files using the immutable option, eliminating any concurrency concerns and allowing SQLite to go even faster for reads.
  • Since the database is read-only, we can accept abritrary SQL queries from our users!