February 28, 2008

This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize

In October 2006, Netflix announced it would give a cool seven figures to whoever created a movie-recommending algorithm 10 percent better than its own. Within two weeks, the DVD rental company had received 169 submissions, including three that were slightly superior to Cinematch, Netflix's recommendation software..

..Potter likes to use what psychologists know about human behavior. "The fact that these ratings were made by humans seems to me to be an important piece of information that should be and needs to be used," he says. Potter has great respect for the technical prowess of BellKor — he is, after all, still behind the team in the rankings — but he thinks the computer science community studying this problem suffers from a bad case of groupthink. He refers to the psychological model underlying their mathematical approach as "crude.."

..A deeper part of Potter's strategy is based on the work of Amos Tversky and Nobel Prize winner Daniel Kahneman, pioneers of the science now called behavioral economics. This new field incorporates into traditional economics those features of human life that are lost when you think of a person as a rational machine, or as a list of numbers representing cinematic taste. One such phenomenon is the anchoring effect, a problem endemic to any numerical rating scheme. If a customer watches three movies in a row that merit four stars — say, the Star Wars trilogy — and then sees one that's a bit better — say, Blade Runner — they'll likely give the last movie five stars. But if they started the week with one-star stinkers like the Star Wars prequels, Blade Runner might get only a 4 or even a 3. Anchoring suggests that rating systems need to take account of inertia — a user who has recently given a lot of above-average ratings is likely to continue to do so. Potter finds precisely this phenomenon in the Netflix data; and by being aware of it, he's able to account for its biasing effects and thus more accurately pin down users' true tastes.

Related

Posted by Tim at 11:25 PM | TrackBack

Joel: Stupid Ideas That Became Successful

I could fill a pretty long book with all the stories of times I thought that an idea was stupid and could never work, only to discover that, in fact, it was pretty inspired. The two bad calls that I'm most proud of? That's easy: eBay (NYSE:EBAY) and Wikipedia.

I became aware of eBay in the mid-1990s, when a friend started using it to buy and sell comic books. I thought it was the stupidest concept ever. I absolutely could not comprehend why people would send money to complete strangers they had found on the Internet. It seemed that there was no protection against fraud and abuse, and the whole arrangement would become a hunting ground for scammers until it fell apart.

That turns out not to be what happened.

I had similar reservations about wikis, which I heard about long before Wikipedia was created..

Posted by Tim at 10:03 PM | TrackBack

Python Profiler: cProfile

if you just want to print the results immediately:
import cProfile
cProfile.run("test_function(a, b, c)", sort=1)
Posted by Tim at 09:15 PM | TrackBack

February 27, 2008

"National Review" and Civil Rights

In 1955, Mr. Buckley started National Review as voice for “the disciples of truth, who defend the organic moral order” with a $100,000 gift from his father and $290,000 from outside donors. The first issue, which came out in November, claimed the publication “stands athwart history yelling Stop.”

It proved it by lining up squarely behind Southern segregationists, saying Southern whites had the right to impose their ideas on blacks who were as yet culturally and politically inferior to them. After some conservatives objected, Mr. Buckley suggested instead that both uneducated whites and blacks should be denied the vote.

Posted by Tim at 08:27 PM | TrackBack

Rope: a python refactoring library

Rope is a python refactoring library.
Posted by Tim at 08:10 PM | TrackBack

February 26, 2008

The Advantages of Closing a Few Doors

The next time you’re juggling options — which friend to see, which house to buy, which career to pursue — try asking yourself this question: What would Xiang Yu do?

Xiang Yu was a Chinese general in the third century B.C. who took his troops across the Yangtze River into enemy territory and performed an experiment in decision making. He crushed his troops’ cooking pots and burned their ships.

Posted by Tim at 08:55 PM | TrackBack

Integrating Movable Type with FogBugz

We were honored when FogCreek granted us a license to use FogBugz for public case tracking and project management for the Movable Type Open Source project, which is the largest deployment of FogBugz for an open source project.

In an effort to repay their kindness, but also to help keep the community as up-to-date as possible on the latest known issues with the product, Chris Hall has implemented a new Movable Type plugin known as MT Fogger.

Posted by Tim at 11:25 AM | TrackBack

February 25, 2008

Metaprogramming: Memorize Factorial

Metaprogramming is writing code that generates or modifies code (typically within the same application). Often metaprogramming takes the form of adding a member to or modifying an existing member of a class. Memoization is an example of something that is often implemented using metaprogramming (when it is available). Memoization is a functional programming technique that is used when a function will always give the same result if you give it the same input. Generating Fibonacci numbers is a great example of where memoization is useful. Here is an example in JavaScript
Posted by Tim at 10:44 PM | TrackBack

February 24, 2008

Django MintCache

MintCache is a caching engine for django that allows you to get by with stale data while you freshen your breath, so to speak.

The purpose of this caching scheme is to avoid the dog-pile effect. Dog-piling is what normally happens when your data for the cache takes more time to generate than your server is answering requests per second. In other words if your data takes 5 seconds to generate and you are serving 10 requests per second, then when the data expires the normal cache schemes will spawn 50 attempts a regenerating the data before the first request completes. The increased load from the 49 redundant processes may further increase the time it takes to generate the data. If this happens then you are well on your way into a death spiral

Posted by Tim at 06:42 PM | TrackBack

February 21, 2008

Rethinking the term "Intellectual Property"

But there's plenty of stuff out there that's valuable even though it's not property. For example, my daughter was born on February 3, 2008. She's not my property. But she's worth quite a lot to me. If you took her from me, the crime wouldn't be "theft". If you injured her, it wouldn't be "trespass to chattels". We have an entire vocabulary and set of legal concepts to deal with the value that a human life embodies.

What's more, even though she's not my property, I still have a legally recognised interest in my daughter. She's "mine" in some meaningful sense, but she also falls under the purview of many other entities..

Posted by Tim at 11:07 PM | TrackBack

February 20, 2008

YUI 2.5.0: upgrades to DataTable, etc

The YUI Team just released version 2.5.0 of the library. We’ve added six new components — Layout Manager, Uploader (multi-file upload engine combining Flash and JavaScript), Resize Utility, ImageCropper, Cookie Utility and a ProfilerViewer Control that works in tandem with the YUI Profiler. This release also contains major improvements to the DataTable Control and new Dual-Thumb Slider functionality in the Slider Control.
Posted by Tim at 09:17 PM | TrackBack

Python meta-programming: simple example

When you call a new-style class, the __new__ method is called with the user-supplied arguments, followed by the __init__ method with the same arguments.

I would like to modify the arguments after the __new__ method is called but before the __init__ method, somewhat like this:

Posted by Tim at 08:53 PM | TrackBack

February 19, 2008

Step by Step - Show and explain visitors what your page has for them

You might have encountered interactive demos created with screencasting and screengrabbing software that explain an interface to users in a step-by-step manner. This is exactly what this script does for web sites.

When you loaded this page and all went well you'll have seen the examples, download and first paragraph section being highlighted and a small panel with information showing up in succession. This is done with this script.

Posted by Tim at 08:21 PM | TrackBack

Lightbox 2.0: Lightview

Lightview was built to change the way you overlay content on a website.
Posted by Tim at 08:19 PM | TrackBack

Django Full Text Search

I recently stumbled across the djapian project over at google code. I've been looking for a full-text indexer for Django for quite a while but hadn't found anything that I felt did what I wanted. The dead search-branch in Django's repository was so tempting but incomplete. I noticed the django-sphinx project during my initial searching months ago but what I felt it was lacking (which was a deal breaker for me) is the ability to search across multiple models. Djapian, however, has the ability to do just this. What I ultimatley wanted to be able to do was something like this:

>>> from search import indexer
>>> search = indexer.search('some search term')
>>> search.count()
2
>>> result = search[0]
>>> result.score
100
>>> for result in search:
...     result.get_object()
...


Posted by Tim at 08:14 PM | TrackBack

February 17, 2008

Obama's Illinois Legislation

Consider a bill into which Obama clearly put his heart and soul. The problem he wanted to address was that too many confessions, rather than being voluntary, were coerced -- by beating the daylights out of the accused.

Obama proposed requiring that interrogations and confessions be videotaped.

This seemed likely to stop the beatings, but the bill itself aroused immediate opposition

..[Obama] responded with an all-out campaign of cajolery. It had not been easy for a Harvard man to become a regular guy to his colleagues. Obama had managed to do so by playing basketball and poker with them..

Posted by Tim at 09:47 PM | TrackBack

February 15, 2008

Creating Excel Files with Python and Django

Luckily, Roman Kiseliov has created an excellent library for writing binary Excel files with Python. The library is called pyExcelerator and is available from its project page on Sourceforge. The writer doesn't require Windows, or Excel, which makes it even easier to run.
Posted by Tim at 06:55 PM | TrackBack

How to prank neighbors who steal wireless

My neighbours are stealing my wireless internet access. I could encrypt it or alternately I could have fun.
Posted by Tim at 06:42 PM | TrackBack

February 14, 2008

Famous Software Disasters

Disaster: The Soviet early warning system falsely indicated the United States had launched five ballistic missiles. Fortunately the Soviet duty officer had a "funny feeling in my gut" and reasoned if the U.S. was really attacking they would launch more than five missiles, so he reported the apparent attack as a false alarm.

Cause: A bug in the Soviet software failed to filter out false missile detections caused by sunlight reflecting off cloud-tops. (more)

Posted by Tim at 06:25 PM | TrackBack

February 13, 2008

Python: Enumerating trees

After a few experiments involving Python generators, postfix expressions, and recursive trees, I've got some code to enumerate binary trees.
Posted by Tim at 09:03 PM | TrackBack

February 11, 2008

Interviewing Applicants: A Bayesian Approach

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said.

There are really two approaches to take in selecting candidates. The first is the approach of the if statement: You form a model of what the candidate ought to do, work out what they ought to know in order to do that, and then you work out the questions to ask (or the features to look for) that demonstrate the candidate knows those things. If they know this and this and this and if they don’t have this bad thing or that bad thing, call them in for an interview (or, if you are interviewing them and they have demonstrated their strength, hire).

The second approach is the classifier approach. Each feature you look for, each question you ask, is associated with a probability. You put them all together and you classify them as interview/no interview or hire/no hire with a certain degree of confidence.

The most important thing about most classifiers is that they can be remarkably naïve and still work. In fact, they often work better when they are naïve. Specifically, they do not attempt to draw a logical connection between the features that best classify candidates and the actual job requirements. Classifiers work by training themselves to recognize the differences that have the greatest statistical relevance to the correct classification.

Posted by Tim at 11:13 PM | TrackBack

February 09, 2008

Fastest templating engine ever. Period.

Since web frameworks seem to be the only thing people are doing in Python these days, templating languages have become increasingly important. The problem that we at Code Irony keep running into is that they’re just not fast enough. How slow are they? We ran some profiling across one of our projects and this is what we’ve found:
Posted by Tim at 03:12 PM | TrackBack

February 08, 2008

College Profile: (Barry) Obama

Nearly three decades ago, Barack Obama stood out on the small campus of Occidental College in Los Angeles for his eloquence, intellect and activism against apartheid in South Africa. But Mr. Obama, then known as Barry, also joined in the party scene.
Posted by Tim at 07:21 PM | TrackBack

February 07, 2008

Map: Pop vs Soda

Generic name for soft drinks by county
Posted by Tim at 10:46 PM | TrackBack

Clarity Sought on Customs Electronics Searches

A few months earlier in the same airport, a tech engineer returning from a business trip to London objected when a federal agent asked him to type his password into his laptop computer. "This laptop doesn't belong to me," he remembers protesting. "It belongs to my company." Eventually, he agreed to log on and stood by as the officer copied the Web sites he had visited, said the engineer, a U.S. citizen who spoke on the condition of anonymity for fear of calling attention to himself.
Posted by Tim at 09:45 PM | TrackBack