SQL Challenge: Cross-Country Scoring

There are a lot of tutorials on the net that provide a basic introduction to SQL, but few that get into advanced techniques. I’m a fan of Cross-Country running and I’ve worked as a database programmer for the past 4 years. Here’s a problem that combines two of my interests into a puzzle that uses advanced joins and subqueries.

What is Cross Country?

Cross Country is team distance-running sport. Unlike track, it is run on grass or dirt. Each course is different. Some are hilly. Others flat.

How is it scored?

The places of each team’s first 5 runners are added together. A teams next 2 runners can “displace” another teams runners, raising the other teams score. The lowest score wins. In the event of a tie, the team with the faster 6th runner wins.

Dual-Meet Scoring

In a meet with 3 teams, each team is matched against the other teams as if it were a dual meet.

Large Meet Scoring

When the meets get large, the scoring method is changed. Scorers no longer separate out the teams because it would be too much work and it is more likely to result in ties.

The result is that scoring of the 4th and 5th runners become especially important. In a dual-meet scored match, teams are often able to win on the strength of their first 3 or 4 runners. A poor 5th runner is a limited liability because the maximum number of points a weak 5th runner can score is capped at 12 (7 opposing runners + 5).

But in a large meet, a poor 5th runner could score 200 points, effectively eliminating even the best team from from medal contention.

The Challenge

The challenge is to take a large meet, separate out each team and score it the same way that 2-way meets are scored. I’ve chosen the 2003 Pennyslvania Distric 3 meet for the sample data. There are 55 teams, resulting in 1456 pairs of matches.

I’ve included SQL-Server table definitions, data, a few hints, and an answer.

The Solution

There’s more that one way to solve the problem and I’d be interested in hearing from people that have non-SQL solutions as well (perl, pyphon, lisp, etc).

In evaluating a solution, I consider:

  1. Simplicity – is it easy to read and understand.
  2. Performance – does it run in under 30 seconds. Faster is better.
  3. Portability – does it use vendor extensions to “standard” SQL

Just so you don’t think that you’re doing my homework for me, I’ve posted a solution. I’ll eventually open the solution section up, but for now, you have to demonstrate that you’ve solved the problem yourself by answering this question:

How many wins, losses and ties did Conestoga Valley have:

If you’re looking for other similar challenges, check out the “Yak Challenge“.

Note: photos taken by the author at several PIAA District and State meets

Update: Solution

 

 

The Great Email “Calculation Debate”

Some readers of my proposal for Spam Guarantees believe that my solution is too complex or costly. I concede that simpler approaches, such as authentication will be tried first, and will provide temporary relief, but ultimately a market-based solution will be necessary.

When such a market is created, it will appear gradually, and it will need to use real money, or it will be unable to police fraud.

It will not eliminate all spam, but offer a way for legitimate senders to avoid becoming spam filter false positives.


Friedrich Hayek, Nobel Prize in Economics

Here’s an overview of my views on whether market-based solutions should use “Computer-Time” or “Real Money”. The issue reminds me of Hayek and the “Calculation Debate“.

Anti-Spam Currencies: Computational vs Monetary

Two of the biggest obstacles to implementing a market-based solution to spam are:

  1. Expense: the cost of administering and billing for each email transaction; and
  2. Fraud: the inevitable attempts to capture the newly created currency

Expense

The “computational” school of thought argues that a postage or guarantee system that uses real money will cost too much to administer.

Accounts will need to be created, tracked, and billed. With so many transactions, the overhead will be enormous.

A system that requires computers to solve a math problem will require no accounts, tracking or billing.

Fraud

Both approaches are vulnerable to fraud.

If the system uses real money, hackers will surely attempt to steal a sender’s key and use it like a stolen credit card.

If the system treats computer-time as currency, hackers will surreptitiously commandeer computers to poach the postage.

Scope Modesty: Certified Mail for the Web

With my proposal for email guarantees, I don’t pretend to the immediately eradicate all spam forever. Imagine instead a system that starts as a premium service, much like current-day certified and registered mail.

People and businesses who really value the email they send will set up accounts to guarantee their mail. This will ensure that their messages make it through the spam filters unscathed.

Not all transactions will need to be tracked for billing purposes, only those messages unwanted by their recipients.

Monetary Incentives

Because senders have to pay for “cashed” email, they will limit the unwanted messages they send, limiting the transaction load that their Credit Company needs to process. The Credit Companies can charge a fee to cover the cost of processing the “cashed” email.

To limit the amount of fraud, Credit Companies will track the amount of guarantees that have been collected. Accounts that reach their limit will be frozen.

Credit Companies may offer to absorb the cost of stolen keys, provided that account holders follow certain security practices. If not, the account holder would be liable, up to the limit of their account.

Computational Incentives

In the computational model, senders and computer owners take no responsibility for securing their systems.

A hacker could install a program on my computer that allows him to offload email computations to my computer.

If my credit card is not being charged for the postage he counterfeits, and the hacker is smart enough not to make my system unusable, I am unlikely to devote the resources necessary to:

  • avoid getting the program in the first place
  • removing it, once I have it

Holding someone liable for poor security creates an incentive to limit the exposure of currency and tighten up security.

Some people may decide that their security is so poor that they are unable to offer a guarantee.

Their messages may still make it through the spam filter, but they run a greater risk of being overlooked.

Spam Filtering through Email Guarantees

Technical approaches to spam such as authentication and bayesian analysis are useful in the short term, but ultimately an economic solution is needed.

I propose guaranteed email: email whose sender offers to pay if the recipient is unhappy with the message.

Others have proposed electronic postage as a way to price the junk out of the market. That’s essentially what I propose except

  1. no money changes hands if the recipient is happy with the message.
  2. money is paid to the recipient, rather than a third party like the postal service or Microsoft

Satisfaction Guaranteed

Example

I write an email to Dave Winer.

Dave gets a lot of email and I want my message to stand out from the spam and other unsolicited “great ideas”.

If I were sending Dave a letter, I would be willing to pay the postal service 37 cents; but to make my email stand out I offer 67 cents. Economists call this signaling.

My email client attaches a file to my email and names it guarantee.xml. This file follows a standard format

<guarantee type>

</guarantee type>
    <guarantee amount>

</guarantee amount>
    <signature url>

    </signature url>
    <signature contents>
       </signature contents>

Dave’s email client already has filtering software built-in.
It would take extra programming, but this filtering software could be modified to parse the guarantee.xml file, lookup and verify the signature, and factor-in the guarantee amount.

To increase Dave’s trust that I won’t default on the 67 cents, my Credit Card (or some such) agency signs my key in a similar way that I signed the email, asserting that they will back up any email guarantee, for up to (say) $7.50.

“Cashing” unwanted Email

If Dave “cashes” his email, my Credit card company will bill me. If I don’t pay my bills, they will revoke their signature on my key.

Since there are a smaller number of Credit agencies, Dave’s filtering software can set up a “trusted” set up Guarantors. (This key signing is similar to the way SSL Certificates work).

Email Clients

Ideally every email program will have a button to “cash” an email. Barring that, people could forward the email on to an address to receive payment. Microsoft’s Outlook and Hotmail are the biggest email clients. Plugins to these would go a long way in gaining momentum.

Checks Cashed

Credit Companies

As for the “Credit Companies”, this is the most difficult part of this whole proposal. It’s going to take a company with experience dealing with fraud and billing, who is also willing to deal in small transactions. I think Paypal might be a good fit. They’re not a conventional bank and it would be a great way to add customers.

The upside of this is that they could charge a small percentage for every “cashed” email. That’s gotta be worth something.

Related

Transquotation with Javascript

Xanadu Logo

Ted Nelson’s Xanadu project has a vision for hypertext where you can quote a document without copying it. The goal is to have something like a pointer to the the selected portion of the original.

I was reading some of Ted’s archives and noticed his use of an  open source eprints publishing software. By counting the number of characters a selection is offset from the start of the document, Eprints allows people to construct links to specific selections. Linked selections can then be shown in context or excerpted in raw text.

Seeing how easily sections could be quoted got me excited about performing similar quotations on my website, but I soon realized that it would be more difficult than using simple html links.

The Quoting Architecture: Start with Quoting Proxies

All I want to do is a “remote include“, but html and javascript don’t allow me to do this directly, so I constructed an intermediary script that acts as a proxy.

The author performing the quotation instructs the reader’s browser to ask the proxy to serve it a remote web page:

<script LANGUAGE="JavaScript1.2" TYPE='text/javascript'
src="https://www.openpolitics.com/transquote/?url=
https://web.archive.org/web/20041123015431/http://tprints.ecs.soton.ac.uk/11/01/zifty-d9.txt;
locspec=charrange:2721/381">
</script>

The proxy script connects to the remote web site on behalf of the reader:


  //open a connection with a timeout of 30 seconds
    $fp = fsockopen($host, $port, $errno, $errstr, 30);        
    fwrite($fp, $out);
    $body = false;
    while (!feof($fp)) {
        $s = fgets($fp, 1024);
        if ( $body )
            $in .= $s;
        if ( $s == "\r\n")
            $body = true;
    }
    fclose($fp);
    (Full Source)

It retrieves the contents of the page and outputs it back to the reader in Javascript output format.

   document.write("The usual story about Xerox PARC,
    that they were trying to make the computer
    understandable to the average man, was a crock.
    They imitated paper and familiar office machines
    because that was what the Xerox executives
    could understand. Xerox was a paper-walloping company,
    and all other concepts had to be ironed onto paper,
    like toner, to be even visible in their paper paradigm.");

Result:

This javascript output method is the same technique
Google uses to deliver its advertisements.

Architecture: Putting the Quotation Logic in the Proxy

It is only because Ted’s eprint server allows offsetted quoting that I
am so easily able to fetch an excerpt.

For other sites, it may make sense to implement the eprints functionality
at the proxy level. This would allow selections to be specified and a greater context shown — all without requiring the quotee to implement any addition server-side features.

For performance and realiabilty reasons, it may make sense for the proxy to cache a copy of the quotation. For compatiblity, it would be nice if all quotations used a standard syntax to specify a quotation. That way the functionality could be added to the browser and quotation meta could be aggregated by search engines.

A Final Note on Security: Open Proxies

If this were to become common, operators of proxies would probably need to only allow registered quotations, otherwise they would be overwhelmed by requests.

Related