I'm normally the type of person who would favor add-on tools like
Google's Autolink. But then I remembered a site I read
several months ago that used Intellitxt. Google has a good reputation,
but I fear their software will be imitated by others and pretty soon we'll
have
a companies like Juno and Netzero adding Intellitxt-type
advertising to everything through a default browser plug-in.
This got me thinking about where to draw the line.
I start by making a distinction between:
Explicit links - are visible links which encourage the reader to follow them
Implicit links - are potential, invisible links which are activated
by an action of the reader. (ie right clicking)
If Google Autolink is only providing a useful service,
it shouldn't mind being part of the right click menu instead
of part of the page. But if Google Autolink is really about advertising and
affiliate dollars, they will be tempted to go explicit.
I imagined a "link fight" back in November of 2000, but I
didn't anticipate that it would be initiated by a major vendor
like google.
In the future, I'd like to see new features like highlighting added (more later) to
the browser, but I think it's important to draw a line somewhere.
For me the line is: no third party explicit links.
There are a lot of tutorials on the net that provide a basic introduction to SQL, but few that get into advanced techniques. I'm a fan of Cross-Country running and I've worked as a database programmer for the past 4 years. Here's a problem that combines two of my interests into a puzzle that uses advanced joins and subqueries.
What is Cross Country?
Cross Country is team distance-running sport. Unlike track, it is run on grass or dirt. Each course is different. Some are hilly. Others flat.
How is it scored?
The places of each team's first 5 runners are added together. A teams next 2 runners can "displace" another teams runners, raising the other teams score. The lowest score wins. In the event of a tie, the team with the faster 6th runner wins.
Dual-Meet Scoring
In a meet with 3 teams, each team is matched against the other teams as if it were a dual meet.
Large Meet Scoring
When the meets get large, the scoring method is changed. Scorers no longer separate out the teams because it would be too much work and it is more likely to result in ties.
The result is that scoring of the 4th and 5th runners become especially important. In a dual-meet scored match, teams are often able to win on the strength of their first 3 or 4 runners. A poor 5th runner is a limited liability because the maximum number of points a weak 5th runner can score is capped at 12 (7 opposing runners + 5).
But in a large meet, a poor 5th runner could score 200 points, effectively eliminting even the best team from from medal contention.
The Challenge
The challenge is to take a large meet, separate out each team and score it the same way that 2-way meets are scored. I've chosen the 2003 Pennyslvania Distric 3 meet for the sample data. There are 55 teams, resulting in 1456 pairs of matches.
There's more that one way to solve the problem and I'd be interested in hearing from people that have non-SQL solutions as well (perl, pyphon, lisp, etc).
In evaluating a solution, I consider:
Simplicity - is it easy to read and understand.
Performance - does it run in under 30 seconds. Faster is better.
Portability - does it use vendor extensions to "standard" SQL
Just so you don't think that you're doing my homework for me, I've posted a solution. I'll eventually open the solution section up, but for now, you have to demonstrate that you've solved the problem yourself by answering this question:
How many wins, losses and ties did Conestoga Valley have:
W
L
T
If you're looking for other similar challenges, check out the "Yak Challenge".
Note: photos taken by the author at several PIAA District and State meets
I have never been to Japan but my father, a linguist, once told me the story of the train station in Tokyo, where the announcements were made in Japanese and English. You would hear four or five minutes of nonstop Japanese and then the English translation would be "The train to Osaka is on platform 4." It seems that in Japanese there is simply no way to say something that simple without cosseting it heavily in a bunch of formal etiquette-stuff. And it turns out the same thing applies to email messages, even in English. The moral of the story is that given two email messages with the same semantic content, the terse one is more likely to come across sounding rude.
The acronym SARG, meaning Search ARGument, was coined by IBM researchers whose System R prototype was the predecessor to DB2. (The seminal 1979 paper by now-IBM-Fellow Pat Selinger et al. is a must-read for any student of query optimization! See reference.) A SARG is a predicate that can be evaluated by the DB2 Data Manager to rows while the page containing them has been momentarily “pinned” in the buffer, without having to extract the rows from that page. In DB2 for z/OS, SARGable predicates are also called “stage-1” predicates. If a predicate cannot be SARGed, then it is “residual” (or “stage-2”, in z/OS-speak), meaning each row has to be copied from its page into a private copy of the row in the DB2 run-time before the predicate can be evaluated. Since this copying takes quite a few CPU cycles, SARGable (stage-1) predicates generally perform much faster than residuals (stage-2 predicates).
There are many other less traditional ways to leverage spatial data. For example, the following figure might depict the progress of a customer's steps through a retail store. The customer's shopping cart is equipped with a Wi-Fi device that receives informational ads based on their location in the store, and broadcasts the path-to-purchase for the store's top selling products. As a customer makes different turns through the store, cross-selling promotional advertisements are displayed on the screen to try and bring the customer back to the high-margin items.
While this first generation of spatial analysis provides for rich spatial analysis features, the data is not integrated with the rest of the corporate data. There are many consequences to this type of implementation. First, it impedes corporate-wide decision making since it, by nature, fragments the single version of the truth. The data is transposed and stored away from the watchful eye of IT, which could lead to unique departmental interpretations of the data.
Know anything about bowling? Or writing hard core TSQL? Or test driven development? Even if you don't, this might be a good way to build those skills. Write a stored procedure that can score a bowling game and you might win one of our books and a shirt!
If you've bowled before you know that the scoring is a little, well, complicated.
In 1997 Richard Lees helped 3 customers (ASB Bank, Wrightsons and Caltex Oil) create business intelligence solutions on a beta version of Microsoft OLAP services. All three customers wanted to get business intelligence out to front line staff, which was difficult using the rich client tools available at the time. Richard recognized that OLAP cubes were rich in meta data, and that it wouldn’t be very difficult to create a server based cube browser. This first version of ThinSlicer (1998) used COM+ and Microsoft’s component services.
return with me now to the thrilling days of yesteryear, when data processing was done on punch cards and magnetic tapes with procedural languages. Modern data processing began with punch cards. The influence of the punch card lingered on long after the invention of magnetic tapes and disk for data storage. This is why early video display terminals were 80 columns across. Even today, files that were migrated from cards to magnetic tape files or disk storage still use 80 column records. But the influence was not just on the physical side of data processing. The methods for handling data from the prior media were imitated in the new media and in the mental models of the programmers.
Most management is too cheap to invest the 2-4 years required to make a programmer into a DBA. Larry B. is right that SQL Server is an expensive piece of software, but not because of what you pay for it; it is expensive because of how much depends on it. Keeping business logic in SQL is good; it ensures that all applications agree on the rules. Putting what each programmer thought all those rules are into thousands of programs is insane.
the developer thinks that the database he sees via VIEWs, GRANTs and stored procedures was built just for him. If he gets a database performance issue, then he can present it to the database guys to solve. He cannot change the schema or mess up the data model on his own initiative.
The technical solution is more SQL, not less. The SQL-92 Standard has CREATE DOMAIN and CREATE ASSERTION statements along with multi-table CHECK() constraints that can replace almost of the triggers in T-SQL today with declarative code.
The publisher pays for storage, the reader pays for delivery,
along with a small per-byte royalty. Nelson recommends rates in
the range of 1/10,000 of a cent per byte for text, perhaps one
cent per minute of video.
Most uses of computers simulate either hierarchy, paper, or both (Acrobat
and the Web). Hierarchy is notably unsuited to most human thought,
creativity, and ongoing changes of projects; paper is a form of confinement
to which we have adapted for two millennia, though the ideas have tried to
escape for a thousand years-- through footnotes, annotations, parallelisms,
and creative layout.
If we dare to challenge these traditions, the alternatives still need
structuring for implementation. The issue is the optimal representation of
ideas-- what relations among discrete structures can best replace
hierarhical directories, and what generalizations of electronic document
will allow profuse bidirectional connections, track content flow from
version to version, allow publishing of annotations and ongoing parallel
documents, and permit large-scale quotation of copyrighted material? (All
are vital for a true electronic literature.)
The alternatives Ted proposes are simple, straightforward, and deeply
different from the prevailing paradigms.