slashdot has rejected my work. Oh well, here it is.
****************************
Beautiful Code is a collection of thirty-three essays by notable computer programmers, all dared by editors Greg Wilson and Andy Oram to hold forth about a piece of source code they have found notably beautiful, apparently with intention of producing what may become a standard computer science textbook. Such an experiment has not been carried out before.<p>
Were Don Knuth dead, this book would probably be dedicated to his memory. Beautiful Code collects "cultural wisdom" of computer programmers. If one were to consider The Art of Computer Programming to be something of a Pentateuch, Beautiful Code could the accompanying Midrash. Having obtained an e-copy from O'Reilly and processed it with pdf2txt and then run <code> grep -ci knuth Chapter*.txt</code>, I found that not everyone mentioned Knuth. Those who did have a K number greater than zero by their name in the synopses below. (Asterisks before the author's name represent other chapters mentioning the author.)
<p>Speaking of experiments, O'reilly is trying to develop an online community based on the book, with <a href="http://beautifulcode.wiki.oreilly.com/wiki/index.php/Main_Page">a wiki.</a><p>
I had several programming projects prioritized ahead of writing this book review, but after leafing through the six hundred pages and reading a few of the pieces in it at random I realized that my professional work would benefit from making reading it cover to cover my primary free-time action item.<p>
To be systematic and thorough, here are quick summaries of the essays, which are more prolix than the editor's version of the same exercise in the preface. Proceeding in this fashion is really horrible book review practice, but there really isn't any cleaner way to approach this particular text.<p>
*** Brian Kernighan (K0) discusses Rob Pike's minimal regular expression engine optimized for pedagogy. <p>
Karl Fogel (K0) tells the origin story of Subversion's "delta editor" internal interface, including an abandoned branch. He introduces a human-factors aspect to software beauty: by having a strongly defined interface, development time which might be squandered on debating interface points can be put to more productive use. <p>
** Jon Bentley (K4), the author of Programming Pearls, holds forth about quicksort, instrumenting quicksort, and generating mathematical forumlae which give the same results. His essay nicely demonstrates the tension between procedural (pseudocode) and descriptive (mathematics) language. <p>
Tim Bray (K0), writing in a tone suitable for a more general audience, uses Ruby's regexps to analyze his Apache server logs from his blog, which he encourages the reader to check out, to build a case for his suspicion that Google uses binary search. <p>
* Elliotte Rusty Harold (K1) draws on experience writing XML verifiers to reinforce the truism that early passes at a problem should focus on correctness, saving speed for later. <p>
Michael Feathers (K0) explains his appreciation for Ward Cunningham's FIT Java testing framework's noncompliance against Java coders' rules of thumb, revealing a tension between closed (final classes, designed-in hooks and extension points) and open (working stub base classes) approaches to designing for extensibility. FIT-derived systems take HTML system documentation with tests embedded in it in tables as input and fill in one table cell with a pass/fail result. By carefully selecting what to hold constant and deferring extensibility to authors of derived classes, FIT allows Java developers, jaded and wounded by corporate infighting, to appreciate the fundamental elegance of a solid, basic, O-O approach. <p>
* Alberto Savoia (K1) plays fast and loose with math jargon. He discusses JUnit as an introduction to an exploration of how to test a binary search function for correctness. He performs a math proof -- claiming that a conclusion follows from implications of known-true axioms -- without calling it that, and a few pages later uses the word "induction" in the non-mathematical sense. He gives us a quick course in software testing best practices, including smoke testing, boundary testing, and what he calls "test theories" which are statements made about what the tested system is supposed to do, and some techniques for deriving additional test theories off the "happy path." Apparently 2006 was the year that data sets reached the size where the (a+b)/2 method of finding the midpoint between a and b started overflowing using signed 32-bit integers, which wreaked all kinds of weird havoc. <p>
Charles Petzold (K0) uses and endorses dynamic code generation for efficient image processing, reminiscing about the feats of the team who wrote Windows 1.0 in 8086 assembly language and applying similar techniques in his C# code thanks to the System.Reflection.Emit namespace. The results aren't easy on the eyes, but they run in a quarter of the time of equivalent code that makes decisions inside loops. It is not clear if the MSIL compiler is going to be so brazen as to perform loop unrolling on bytecodes emanating from ILGenerator objects; if not, I expect that the optimal acceleration of the dynamic method would occur with the loops unrolled into a modified Duff's device large enough to use almost all of the CPU cache. That would remove many loop bounds comparisons, which aren't that many instructions compared with the operation being performed on each pixel, but which may necessitate CPU pipeline stalls. To save another two percent of the time, and possibly hurt general system performance during the operation. Small processes take less effort to switch between. So never mind. Unless you want to benchmark it for something to do. <p>
Douglas Crockford (K1), who explains why LISP has never been accepted by the mainstream, and why the LISP community doesn't care, proposes to use a technique from a 1973 paper (anticipating object orientation, no less) by Vaughn Pratt ("Top Down Operator Precedence") that modified a technique published by Robert W. Floyd (for whom Knuth wrote a glowing eulogy) to write a parser that can process a subset of ECMA-262 (a.k.a. Javascript) entirely within that subset, which includes functions, objects, and JSON literals. It's all about the left binding power. And the "nud" and the "led," terms which when given to a search engine yield a slightly easier-going version of chapter nine. Null denotation and left denotation. Truthy and falsy. Don't repeat yourself, write a macro. A parser yields a parse tree, and hands that off to something else. A proposal for accommodating the needs of both language designers who want to reserve a lot of symbols and programmers who don't want to have to synonymize around them. Would extending Pratt's technique to support languages where not everything is a functional expression really be as easy as Crockford makes it look? Later in the book, R. Kent Dybvig (K0) explains how Scheme's syntax-case macro expander lets the programmer forget the whole deal about using quasi-reserved names in macros, because it abstracts them out for you, while offering an extra-special syntax for when you really truly do want to interfere with the internal operation of a macro. I am very pleased that O'Reilly sent me this book to review while I have been working on a macro system for Perl, and having read the Dybvig and Crockford articles, Macrame.pm will be better that it would have been had I not read these articles, when it finally does get completed. <p>
Henry S. Warren, Jr. (K1) rants about a fundamental problem, the best way to count set bits. Stunningly, divide-and-conquer approaches using bit shifting beat optimized brute force approaches, although calling them opaque is an understatement. Population count processing is applicable to a wide variety of basic data processing infrastructure problems. Warren's elucidation of the situation is a perfect example of the best practice development process: correct and fast, in that order. <p>
A USABILITY GURU: Ashish Gulhati (K0), who prefers to work remotely from the Himalayas, talks about how helping some social idealists inspired Cryptonite, his end-user-friendly encrypted webmail system, then launches into the history of the project, including architecture implementation details and discussion of security concerns and benchmarking, and how using well defined interfaces made replacing one component with a more efficient one with comparable functionality a relative breeze. Along with Lincoln Steins's chapter, this chapter is a nice introduction to object oriented Perl and how the reusability aspect of a good interface design effectively optimizes programmer time in any language. <p>
BIOINFORMATICIANS: Lincoln Stein (K0), without encumbering his narrative with O-O terminology, presents his Bio::Graphics system for displaying genetics research results as a example of usabilty done right, this time usability by one's fellow programmers. The advantage of using standard interfaces becomes apparent when it became possible to support many graphics formats by drop-in replacement of the GD library, which outputs in only one format, with a different library with the same interface but configurability to output in other formats too. The tension between perfection and deliverability is discussed, as Stein reveals his big wishlist item for his library but opines that the level of rewriting required is expected to prevent its reprioritization, especially as there is an effective workaround. Immediately following that, Jim Kent (K0) tells us how readability and understandability lead to extensibility in the Gene Sorter, including a short tutorial on writing polymorphic objects in C. <p>
MATH GEEKS: Adam Kolawa (K0) uses and endorses the CERN math library, which evolved into LAPACK. The well-commented core routines, in Fortran with bindings for use by other languages, are in regular use after three decades. Jack Dongarra and Piotr Luszczek (K0) discuss recent modifications made to LAPACK to take full advantage of modern hardware. <p>
KERNEL HACKERS:Greg Kroah-Hartman (K0) appreciates the discipline involved in maintaining the Linux kernel, where the acrobats operate in full view of the world without the benefit of nets such as dynamic type checking; Diomidis Spinellis (K0) uses the BSD virtual file system as the springboard for presenting general truths about indirection and maintainability; and Bryan Cantrill (K0) likens software bugs to sewage, in particular an embarrassingly unforseen race condition in a released version of Solaris. <p>
PYTHON MONGERS:* Andrew Kuchling (K0) discusses Python's hash tables, Travis E. Oliphant (K0) discusses the optimized n-dimensional iterator that lets the NumPy python math library do matrix operations without blowing its cache, and Rogerio Atem de Carvalho and Rafael Monnerat (K0) hype the flexibility of ERP5, an enterprise resource planning system that runs on python/zope. <p>
ROBERT HEINLEIN FANS: Ronald Mak (K0) uses J2EE beans to communicate both with Mars rovers and their legions of fans who want to download terabytes. <p>
PUTTING THE GOO IN GOOGLE: Jeffrey Dean and Sanjay Ghemawat (K0) discuss MapReduce. If you don't already know what MapReduce is, you probably know what to do to find out. <p>
THE NETWORK IS THE COMPUTER: Simon Peyton Jones (K0) introduces STM (Software Transactional Memory) which is an allegedly fool-proof replacement for locking discipline in multithreaded Haskell, and then discusses his solution to Trono's "Santa Claus problem" without explaining why the problem is supposed to be difficult.
<p>
William R. Otte and Douglas C. Schmidt (K0) manage to abstract concurrency models out of their framework for networked services. <p>
Andrew Patzer (K0) reveals some practical best practices as he tells the story of doing a straightfoward piece of systems integration using RESTful web services. <p>
RADICAL USABILITY Andreas Zeller (K0), apparently a devotee of the Einsteinian craft-a-tool-to-straighten-the-bent-paperclip-from-the-good-paperclip-you-have-found methodology, uses an automated divide-and-conquer approach to discovering which of several thousand patches in the delta between two versions of gdb was responsible for a mysterious loss of function in the ddd graphical front end. Had ddd a debugging mode that logged the full interaction between itself and gdb, the world might not be the better for his automated technique. <p>
* Yukihiro Matsumoto (K0), the author of Ruby, might be annoyed that his essay on the nature of elegance, including the DRY (Don't Repeat Yourself) principle and other similarly useful tips, was not selected as a forward instead of buried towards the end of the collection. <p>
COMPUTER ACCESS FOR ALL: * T. V. Raman (K0) shares his amazement at how straightforwardly the extensible design of emacs Lisp, with its advice system that allows insertion of function before, after, or fully replacing any of the existing emacs functions, allowed his system to adapt emacs for eyes-free use to bush out over the extended development of emacspeak as a hobby project. Takeaways include the advice that when adapting a communication for another medium, it is important to communicate the message, not the artifacts of the previous medium. He is justifiably proud of his eyes-free calendar widget, among others. Bizarrely, there is no mention of Morse Code in Arun Mehta(K0)'s retrospective on a project to develop an improved system for the use of Stephen Hawking, who could only use one button (but he could control how long he pressed it, and could watch a screen as he did so.)
<p> BEAUTIFUL CAN MEAN EASY ON THE EYES: Laura Wingerd and Christopher Seiwald (K0), from perforce, discuss readable layout, including selection of line break points, and reveal the strong correlation between bugs and nested ifs. (Takeaway: prefer case statements, even if it means repeating yourself)
<p> CASTLES MADE OF SAND
Brian Hayes (K0) demonstrates the truth to the saying "while in other disciplines the researchers stand on the shoulders of giants, in computer science we stand on each other's shoes" as he flails around in search of, and eventually finds, the easy way to find co-linearity of three points in order to analyze satellite photos of the lower Mississippi river, which tends to change course during high water.
<p> IN CONCLUSION
<p> I think if I had been asked to write a chapter for this book, I would talk about normalized Huffman tables, as discussed in Managing Gigabytes, which Hans Reiser recommended in response to my relating to him a conversation about ReiserFS had by the Kansas City LUG. Beautiful code has a lot of different things happening in it, with snippets of source code in various languages,\and nice diagrams where appropriate. It was not a fast read, although some of the pieces do flow nicely, due to the necessity of retooling one's entire brain from one essay to the next.
<p> Were I still a single nerd, and my girlfriend were to present me with Beautiful Code, perhaps as a birthday present, I would not hesitate to select her as my personal and permanent nerdess, all other things being equal.
<p> David Nicol, inventor of the online tip jar and always looking for collaborators with spare tuits, can be contacted through the comment form at tipjar.com.