Gmail, machine learning, filters

We’ve gotten to the point, particularly with Google but also with the other webmail providers, where the bulk of egregious spam is blocked. What’s left is not some spammer sending 10MM messages, but a much more difficult problem. Spam that reaches the inbox is sent in much smaller quantities. It’s also heavily targeted. Spammers are trying to look like legitimate marketers but still sending mail without permission.

This targeted spam is something I’ve been thinking about a lot lately. Mostly because anti-spammers did a pretty good job making not-spamming look like it was beneficial to senders. Many deliverability recommendations boil down to stop spamming but phrased in a way that makes the advice more palatable. Much of the type of spam that’s getting caught in the new filters follows deliverability recommendations. The piece it misses is that it’s not being sent with the permission of the recipient.

.. Believe it or not, spam filters started out as protecting users from mail they didn’t ask for. As the internet as grown and email has become a channel for crime the focus of filters have changed. But, fundamentally, deep down, the original purpose of keeping mail boxes useful by stopping unsolicited mail is still there. The ML filters are giving Google, and others, tools to actually address that mail better.

Ask HN: Which industries will be transformed by ML in 10 years?

The bad industries ones will be transformed before the good ones. What I mean by that is that computer vision applied to medical imaging would be huge. But the detection/classification isn’t accurate enough for that field, just yet. Yes, results are amazing on standard datasets such as ImageNet but they fail to become equally good when there are orders of magnitudes less amount of data. And in the field, accuracy is very important, a net classifying cancer correctly 90 % of the time is likely useless.

One exception is automated language translation which is getting very good. I’m noticing that some of the articles papers I’m reading are machine translated. They appear to apply machine translation to English articles and then have some editor doing manual touch-ups which seldom is enough.

The “bad” industries such as spam and SEO can definitely benefit from ML as it exists today. There are ML algorithms (LSTM) that can generate faked web sites with images that, from Googlebot’s point of view, are completely indistinguishable from real sites. Another use would be to generate realistic looking accounts in social media to steer the conversation, perhaps for political purposes. Porn obviously, could also use ML due to the huge amount of data (the porn itself and user interactions) available.


.. I think it’s pretty safe to say finance will be a big one. Finance has a large amount of individuals and firms researching the applications of ML methodologies to financial indicators. With the semi-recent rise of quant firms, I think this research is only going to get more aggressive, and HFT will become more lucrative and more automated as long as regulation does not get in the way.


.. HFT – yes. But longer-term investment (i.e. Buffett – or even with a horizon of a couple of years) is unlikely to be transformed soon – ML needs vast historical data, which is very slow to generate. Waiting 10 years only gives you 10 years of history, which is 5 non-overlapping 2-year forward returns, and maybe 1 or 2 economic/financial regimes.

This is also a problem with new datasets being generated – there is not nearly enough history available to test them or feed them to a ML system.

Furthermore, arguably, longer-term investment requires forward-looking modelling of scenarios, based on the kinds of inputs that were not seen in history. ML is not very applicable when you get big covariate shifts.

So I would say human financial analysts are not going anywhere, and any improvements would be relatively small and incremental.

.. HFT is not profitable. Its completely commoditized.

Hiya protects you from spam callers and phishing scams

Everybody hates spam calls. Beyond those annoying “you won a cruise” messages that interrupt your day and clog your voicemail, scammers can use your phone number in phishing schemes and even trick you into giving up precious personal information. Luckily, when it comes to blocking spam and flagging unwanted numbers, you’ve got options.

We like the app Hiya | Android | iOS | because it serves numbers with a side of context. Hiya gives bit more information about numbers outside of your contact list, flagging them as likely spam, a colleague from work or an important call from the doctor’s office. Hiya also lets you control and update a personalized block list and report nuisance callers.

Hiya aggregates spam lists from their carrier partners to offer comprehensive protection against annoying and phishy calls and texts. It’s ad-free and easy to use. Grab the app and start protecting yourself today.

What is Spamnesty?

Spamnesty is a way to waste spammers’ time. If you get a spam email, simply forward it to, and Spamnesty will strip your email address, pretend it’s a real person and reply to the email. Just remember to strip out any personal information from the body of the email, as it will be used so the reply looks more legitimate.

That way, the spammer will start talking to a bot, and hopefully waste some time there instead of spending it on a real victim. Meanwhile, Spamnesty will send you an email with a link to the conversation, so you can watch it unfold live!