Good Programers Worry about Data Structures

“. . . unlike every single horror I’ve ever witnessed when looking closer at SCM products, git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful. . . .

“I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

— Linus Torvalds, https://lwn.net/Articles/193245/

What “Worse is Better vs The Right Thing” is really about

What prompted me to publish it now – at least the first, relatively finished part – is Steve Yegge’s post, an analogy between the “liberals vs conservatives” debate in politics and some dichotomies in the professional worldviews of software developers. The core of his analogy is risk aversion: conservatives are more risk averse than liberals, both in politics and in software.

I want to draw a similar type of analogy, but from a somewhat different angle. My angle is, in politics, one thing that people view rather differently is the role of markets and competition. Some view them as mostly good and others as mostly evil. This is loosely aligned with the “right” and the “left” (with the caveat that the political right and left are very overloaded terms).

.. I’ll claim that the view of economic evolution is what underlies the Worse Is Better vs The Right Thing opposition – and not the trade-off between design simplicity and other considerations as the essay states.

.. So the essay says one thing, and I’ll show you it really says something else. Seriously, I will.

And then I’ll tell you why it’s important to me, and why – in Yegge’s words – “this conceptual framework became one of the most important tools in my toolkit” (though of course each of us is talking about his own analogy).

Specifically, I came to think that you can be for evolution or against it, and I’m naturally inclined to be against it, and once I got that, I’ve been trying hard to not overdo it.

.. Linus Torvalds thus views competition as a source of progress more important than anyone’s ability to come up with bright ideas. Alan Kay, on the contrary, perceives market constraints as a stumbling blockinsurmountable for the brightest idea.

A look back: Bram Cohen vs Linus Torvalds

But the really interesting thing about their interchange is not the fireworks in the thread but the way things look like in hindsight. Having become familiar with Git over the last few days I have to say that Torvalds was right on just about every count. I knew Torvalds was smart, but seeing as I was never really more than an occasional Linux user I never realized just how smart; I’d thought he was just a good programmer who happened to be in the right place at the right time and had a few good ideas. But after closely studying Git I’m a little bit awestruck; Torvalds is a frickin’ genius, a true visionary, and somehow managed to just “get it” and instantly, in a flash of insight, come up with “the solution” for version control.

The Cathedral and the Bazaar, by Eric Raymond

Linus Torvalds’s style of development – release early and often, delegate everything you can, be open to the point of promiscuity – came as a surprise. No quiet, reverent cathedral-building here – rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches (aptly symbolized by the Linux archive sites, who’d take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles.

The fact that this bazaar style seemed to work, and work well, came as a distinct shock. As I learned my way around, I worked hard not just at individual projects, but also at trying to understand why the Linux world not only didn’t fly apart in confusion but seemed to go from strength to strength at a speed barely imaginable to cathedral-builders.

.. 1. Every good work of software starts by scratching a developer’s personal itch.

.. 2. Good programmers know what to write. Great ones know what to rewrite (and reuse).

.. Linus Torvalds, for example, didn’t actually try to write Linux from scratch. Instead, he started by reusing code and ideas from Minix, a tiny Unix-like OS for 386 machines. Eventually all the Minix code went away or was completely rewritten – but while it was there, it provided scaffolding for the infant that would eventually become Linux.

.. you often don’t really understand the problem until after the first time you implement a solution. The second time, maybe you know enough to do it right. So if you want to get it right, be ready to start over at least once.

.. 5. When you lose interest in a program, your last duty to it is to hand it off to a competent successor.

.. 6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.

.. Linus’ cleverest and most consequential hack was not the construction of the Linux kernel itself, but rather his invention of the Linux development model.

.. One unexpected side-effect of FSF’s policy of trying to legally bind code into the GPL is that it becomes procedurally harder for FSF to use the bazaar mode, since they believe they must get a copyright assignment for every individual contribution of more than twenty lines in order to immunize GPLed code from challenge under copyright law. People who copyright using the BSD and MIT X Consortium licenses don’t have this problem; they’re not trying to reserve rights that anyone might have an incentive to challenge.

.. 7. Release early. Release often. And listen to your customers.

.. Most developers (including me) used to believe this was bad policy for larger than trivial projects, because early versions are almost by definition buggy versions and you don’t want to wear out the patience of your users.

This belief reinforced the general commitment to a cathedral-building style of development. If the overriding objective was for users to see as few bugs as possible, why then you’d only release one every six months (or less often), and work like a dog on debugging between releases.

.. In those early times (around 1991) it wasn’t unknown for him to release a new kernel more than once a day! Because he cultivated his base of co-developers and leveraged the Internet for collaboration harder than anyone else, this worked.

.. Linus is a damn fine hacker (how many of us could engineer an entire production-quality operating system kernel?). But Linux didn’t represent any awesome conceptual leap forward. Linus is not (or at least, not yet) an innovative genius of design in the way that, say, Richard Stallman or James Gosling (of NeWS and Java) are. Rather, Linus seems to me to be a genius of engineering, with a sixth sense for avoiding bugs and development dead-ends and a true knack for finding the minimum-effort path from point A to point B.

.. what was he maximizing? What was he cranking out of the machinery?

Put that way, the question answers itself. Linus was keeping his hacker/users constantly stimulated and rewarded – stimulated by the prospect of having an ego-satisfying piece of the action, rewarded by the sight of constant (even daily) improvement in their work.

.. Here, I think, is the core difference underlying the cathedral-builder and bazaar styles. In the cathedral-builder view of programming, bugs and development problems are tricky, insidious, deep phenomena. It takes months of scrutiny by a dedicated few to develop confidence that you’ve winkled them all out. Thus the long release intervals, and the inevitable disappointment when long-awaited releases are not perfect.

In the bazaar view, on the other hand, you assume that bugs are generally shallow phenomena – or, at least, that they turn shallow pretty quick when exposed to a thousand eager co-developers pounding on every single new release. Accordingly you release often in order to get more corrections, and as a beneficial side effect you have less to lose if an occasional botch gets out the door.

.. “Debugging is parallelizable”

.. “The total cost of maintaining a widely used program is typically 40 percent or more of the cost of developing it. Surprisingly this cost is strongly affected by the number of users. More users find more bugs“.

.. Linux kernel version are numbered in such a way that potential users can make a choice either to run the last version designated “stable” or to ride the cutting edge and risk bugs in order to get new features

.. Brooks, Chapter 9: “Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won’t usually need your [code]; it’ll be obvious.”

.. 10. If you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource.

.. 11. The next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.

.. if you are completely and self-deprecatingly truthful about how much you owe other people, the world at large will treat you like you did every bit of the invention yourself and are just being becomingly modest about your innate genius. if you are completely and self-deprecatingly truthful about how much you owe other people, the world at large will treat you like you did every bit of the invention yourself and are just being becomingly modest about your innate genius.

.. 13. “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.”

.. With the SMTP forwarding feature, it pulled far enough in front of the competition to potentially become a “category killer”, one of those classic programs that fills its niche so competently that the alternatives are not just discarded but almost forgotten.

I think you can’t really aim or plan for a result like this. You have to get pulled into it by design ideas so powerful that afterward the results just seem inevitable, natural, even foreordained. The only way to try for ideas like that is by having lots of ideas – or by having the engineering judgment to take other peoples’ good ideas beyond where the originators thought they could go.

.. most science and engineering and software development isn’t done by original genius, hacker mythology to the contrary.

.. 14. Any tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.

.. I believe the fetchmail project succeeded partly because I restrained my tendency to be clever;

.. A bazaar project coordinator or leader must have good people and communications skills.

This should be obvious. In order to build a development community, you need to attract people, interest them in what you’re doing, and keep them happy about the amount of work they’re doing.

..It is not a coincidence that Linus is a nice guy who makes people like him and want to help him. It’s not a coincidence that I’m an energetic extrovert who enjoys working a crowd and has some of the delivery and instincts of a stand-up comic. To make the bazaar model work, it helps enormously if you have at least a little skill at charming people.

.. He argued that the complexity and communication costs of a project rise with the square of the number of developers, while work done only rises linearly. This claim has since become known as “Brooks’s Law” and is widely regarded as a truism. But if Brooks’s Law were the whole picture, Linux would be impossible.

.. In his discussion of “egoless programming”, Weinberg observed that in shops where developers are not territorial about their code, and encourage other people to look for bugs and potential improvements in it, improvement happens dramatically faster than elsewhere.

.. the traditional Unix world was prevented from pushing this approach to the ultimate by several factors. One was the legal contraints of various licenses, trade secrets, and commercial interests. Another (in hindsight) was that the Internet wasn’t yet good enough.

.. Linux was the first project to make a conscious and successful effort to use the entire world as its talent pool. I don’t think it’s a coincidence that the gestation period of Linux coincided with the birth of the World Wide Web, and that Linux left its infancy during the same period in 1993-1994 that saw the takeoff of the ISP industry and the explosion of mainstream interest in the Internet. Linus was the first person who learned how to play by the new rules that pervasive Internet made possible.

.. While cheap Internet was a necessary condition for the Linux model to evolve, I think it was not by itself a sufficient condition. Another vital factor was the development of a leadership style and set of cooperative customs that could allow developers to attract co-developers and get maximum leverage out of the medium.

“Having been brought up in a serf-owner’s family, I entered active life, like all young men of my time, with a great deal of confidence in the necessity of commanding, ordering, scolding, punishing and the like. But when, at an early stage, I had to manage serious enterprises and to deal with [free] men, and when each mistake would lead at once to heavy consequences, I began to appreciate the difference between acting on the principle of command and discipline and acting on the principle of common understanding. The former works admirably in a military parade, but it is worth nothing where real life is concerned, and the aim can be achieved only through the severe effort of many converging wills.”

.. The “utility function” Linux hackers are maximizing is not classically economic, but is the intangible of their own ego satisfaction and reputation among other hackers. (One may call their motivation “altruistic”, but this ignores the fact that altruism is itself a form of ego satisfaction for the altruist). Voluntary cultures that work this way are not actually uncommon; one other in which I have long participated is science fiction fandom ..

.. Perhaps in the end the open-source culture will triumph not because cooperation is morally right or software “hoarding” is morally wrong (assuming you believe the latter, which neither Linus nor I do), but simply because the commercial world cannot win an evolutionary arms race with open-source communities that can put orders of magnitude more skilled time into a problem