Google China: A very different image

January 28, 2006 by timlangeman

As someone outside the Chinese firewall, I can compare the Chinese version of google, with the regular version. For some searches, the results can be quite dramatic:

Take for instance tiananmen vs tiananmen.

via the Floating Sun Weblog

No Third-Party Explicit Links

March 2, 2005 by timlangeman

I’m normally the type of person who would favor add-on tools like Google’s Autolink. But then I remembered a site I read several months ago that used Intellitxt. Google has a good reputation, but I fear their software will be imitated by others and pretty soon we’ll have a companies like Juno and Netzero adding Intellitxt-type advertising to everything through a default browser plug-in.

Here’s a sample of Intellitxt in use. Note that the green advertising links appear on my Windows PC [screenshot], but not my Mac. .

This got me thinking about where to draw the line.

I start by making a distinction between:

Explicit links – are visible links which encourage the reader to follow them
Implicit links – are potential, invisible links which are activated by an action of the reader. (ie right clicking)

If Google Autolink is only providing a useful service, it shouldn’t mind being part of the right click menu instead of part of the page. But if Google Autolink is really about advertising and affiliate dollars, they will be tempted to go explicit.

I imagined a “link fight” back in November of 2000, but I didn’t anticipate that it would be initiated by a major vendor like google.

In the future, I’d like to see new features like highlighting added (more later) to the browser, but I think it’s important to draw a line somewhere.

For me the line is: no third party explicit links.

NYTimes: Ads Embedded in Online News Raise Questions
A future post about adding “Highlighting” to the Browser

SQL Challenge: Cross-Country Scoring

March 2, 2005 by timlangeman

There are a lot of tutorials on the net that provide a basic introduction to SQL, but few that get into advanced techniques. I’m a fan of Cross-Country running and I’ve worked as a database programmer for the past 4 years. Here’s a problem that combines two of my interests into a puzzle that uses advanced joins and subqueries.

What is Cross Country?

Cross Country is team distance-running sport. Unlike track, it is run on grass or dirt. Each course is different. Some are hilly. Others flat.

How is it scored?

The places of each team’s first 5 runners are added together. A teams next 2 runners can “displace” another teams runners, raising the other teams score. The lowest score wins. In the event of a tie, the team with the faster 6th runner wins.

Dual-Meet Scoring

In a meet with 3 teams, each team is matched against the other teams as if it were a dual meet.

Large Meet Scoring

When the meets get large, the scoring method is changed. Scorers no longer separate out the teams because it would be too much work and it is more likely to result in ties.

The result is that scoring of the 4th and 5th runners become especially important. In a dual-meet scored match, teams are often able to win on the strength of their first 3 or 4 runners. A poor 5th runner is a limited liability because the maximum number of points a weak 5th runner can score is capped at 12 (7 opposing runners + 5).

But in a large meet, a poor 5th runner could score 200 points, effectively eliminating even the best team from from medal contention.

The Challenge

The challenge is to take a large meet, separate out each team and score it the same way that 2-way meets are scored. I’ve chosen the 2003 Pennyslvania Distric 3 meet for the sample data. There are 55 teams, resulting in 1456 pairs of matches.

I’ve included SQL-Server table definitions, data, a few hints, and an answer.

The Solution

There’s more that one way to solve the problem and I’d be interested in hearing from people that have non-SQL solutions as well (perl, pyphon, lisp, etc).

In evaluating a solution, I consider:

Simplicity – is it easy to read and understand.
Performance – does it run in under 30 seconds. Faster is better.
Portability – does it use vendor extensions to “standard” SQL

Just so you don’t think that you’re doing my homework for me, I’ve posted a solution. I’ll eventually open the solution section up, but for now, you have to demonstrate that you’ve solved the problem yourself by answering this question:

How many wins, losses and ties did Conestoga Valley have:

If you’re looking for other similar challenges, check out the “Yak Challenge“.

Note: photos taken by the author at several PIAA District and State meets

Update: Solution

Here’s my archived answer from from 2005. I also wrote a posted a few parts of an updated solution.

The Great Email “Calculation Debate”

February 19, 2005 by timlangeman

Some readers of my proposal for Spam Guarantees believe that my solution is too complex or costly. I concede that simpler approaches, such as authentication will be tried first, and will provide temporary relief, but ultimately a market-based solution will be necessary.

When such a market is created, it will appear gradually, and it will need to use real money, or it will be unable to police fraud.

It will not eliminate all spam, but offer a way for legitimate senders to avoid becoming spam filter false positives.

Here’s an overview of my views on whether market-based solutions should use “Computer-Time” or “Real Money”. The issue reminds me of Hayek and the “Calculation Debate“.

Anti-Spam Currencies: Computational vs Monetary

Two of the biggest obstacles to implementing a market-based solution to spam are:

Expense: the cost of administering and billing for each email transaction; and
Fraud: the inevitable attempts to capture the newly created currency

Expense

The “computational” school of thought argues that a postage or guarantee system that uses real money will cost too much to administer.

Accounts will need to be created, tracked, and billed. With so many transactions, the overhead will be enormous.

A system that requires computers to solve a math problem will require no accounts, tracking or billing.

Fraud

Both approaches are vulnerable to fraud.

If the system uses real money, hackers will surely attempt to steal a sender’s key and use it like a stolen credit card.

If the system treats computer-time as currency, hackers will surreptitiously commandeer computers to poach the postage.

Scope Modesty: Certified Mail for the Web

With my proposal for email guarantees, I don’t pretend to the immediately eradicate all spam forever. Imagine instead a system that starts as a premium service, much like current-day certified and registered mail.

People and businesses who really value the email they send will set up accounts to guarantee their mail. This will ensure that their messages make it through the spam filters unscathed.

Not all transactions will need to be tracked for billing purposes, only those messages unwanted by their recipients.

Monetary Incentives

Because senders have to pay for “cashed” email, they will limit the unwanted messages they send, limiting the transaction load that their Credit Company needs to process. The Credit Companies can charge a fee to cover the cost of processing the “cashed” email.

To limit the amount of fraud, Credit Companies will track the amount of guarantees that have been collected. Accounts that reach their limit will be frozen.

Credit Companies may offer to absorb the cost of stolen keys, provided that account holders follow certain security practices. If not, the account holder would be liable, up to the limit of their account.

Computational Incentives

In the computational model, senders and computer owners take no responsibility for securing their systems.

A hacker could install a program on my computer that allows him to offload email computations to my computer.

If my credit card is not being charged for the postage he counterfeits, and the hacker is smart enough not to make my system unusable, I am unlikely to devote the resources necessary to:

avoid getting the program in the first place
removing it, once I have it

Holding someone liable for poor security creates an incentive to limit the exposure of currency and tighten up security.

Some people may decide that their security is so poor that they are unable to offer a guarantee.

Their messages may still make it through the spam filter, but they run a greater risk of being overlooked.

openpolitics.com