Unsexy Development - Port 25: The Open Source Community at Microsoft
< Back to Blogs
Unsexy Development by anandeep on October 27, 2006 12:27PM

I loved doing development in a research and university environment. You got to write cool code, prove new ideas, break new ground and generally ended up with bragging rights to say “I did an image recognition algorithm on a multi-layer architecture implementing reactive and planning parallelism on an autonomous robot!”  The code had to work on your workstation or maybe on a demo machine once.  Once you wrote the code, the only people who touched the system were hapless graduate students implementing the next big idea. They had to come to you and you could then dazzle them with your insight!  This was “sexy development”!

When I moved to industry and wrote software for day to day use – things changed.  Now you had all those people with “manager” titles telling you what to do, and those people called “testers” who told you why your code sucked (you couldn’t logically argue your way out of that because the weasels usually had proof)!.  Of course being consummate professionals you adapted. You got the religion of “bullet proof code” and worked on making sure the testers only had “fit and finish” bugs filed against you. Which the intern could work on.  That was still fun  - a different challenge maybe not as “pure” as designing a neat new algorithm but pretty good nevertheless!

You got past the testers but when they integrated the components that you had bullet-proofed to run end-to-end or user acceptance tests, unexpected stuff happened. Who would have thought that they would configure the machine that way or that another non-surface component could pass you null strings. Now you had to plan not only for the testers – but also for other developers and those pesky sys admin guys.  How did they become sys admins? They couldn’t tell a polynomial solution from a log n solution anyway!  But being nothing if not adaptable you adapted.  You now built bullet proof AND idiot proof code.  (My father, a military pilot and flight instructor, when teaching flight safety used to say “Nothing is foolproof because fools are so ingenious!”).  It got a little boring at times but you still had the satisfaction of building something that was “engineered”.

I thought I had shipped the product but I  found couldn’t sit back and relax. The support guys were making insinuations against my code. It didn’t work they said – and you hadn’t put in the right level of granularity in the logs for them to do a diagnosis.  This had nothing to do with Computer Science – any bozo could write stuff to the log. Why didn’t the intern do it? What do you mean he can’t make sense of my code? Yeah, I do know my code best. I guess it’s the right thing to do. Certainly not as fun as designing, bullet proofing and idiot proofing new code but good supportability is “sine qua non”  for a well done project!

Is that the end of it? No, further design and coding needs to be done for making software more manageable, to make the logs more systematic, to make sure that the product works when its deployed to multiple configurations, that it performs well and fails gracefully.

Unless you specialize in a certain aspect of manageability, reliability or diagnosis – this is not “sexy” development.  I probably wouldn’t get as much satisfaction from designing event logs as I would from designing a new search algorithm. 

I was getting paid to do all this (ok, so it was my own startup but I was getting paid in VC money!) and it was still very hard. We did do it but it took lots of coaxing of our developers to pay attention to this.  They all preferred to work on the next release that had all the sexy features. Even though they knew that to make the startup successful and still have a job, the unsexy stuff needed to be done and done RIGHT!

When you are working for the “love of the game”  and not money, like in Open Source – who coaxes you?  Who does the unsexy stuff? Are there enough people who specialize in the esoteric aspects of event logs, that this is not  a problem? Or do users who need the feature “just do it” and add the code to the community version? Or are things slipping through the cracks?

I did a sweep of the usual suspect Linux developer mailing lists and found that there is concern about whether unsexy stuff gets done. Here is a typical comment that I saw

I think that the only issue with Open Source boils down to this:

The things that nobody wants to do, but somebody has to.

Nobody wants to think about documentation. Or user interfaces. These things are hard, tedious, and a hell of a lot more boring than actually coming up with stuff to "make things work
".”  (from here)

Documentation is famously one of those things that is considered “unsexy” (well, ok in commercial software too).  There are efforts like Grokdoc to make documentation of Open Source projects sexy by making it a priority. But the “who does unsexy?” issue is a real concern in Open Source.

We ran into a similar issue with event logs. You know the text stuff you write so that you can find out later what happened.  At the lab we just did an investigation of whether we could tell if one of our boxes had crashed from the syslog and from console messages. We were a little taken aback by how many times we couldn’t tell what states the machine had gone through.

On doing some investigation we found that the most influential project that was addressing this issue,  the Evlog project (most supported by IBM) has been quiet since 2004. This code is used internally within IBM but was not mainstreamed into the Linux kernel.

How does one get  unsexy stuff like this into the Linux kernel so that is comparable to UNIX/VMS/Windows?

I contend that it is critical to Open Source that attention be paid to the event logs. They are critical in making any operating systems reliable. VMS/UNIX/Windows all went through the process of making their event logs more meaningful – and this has helped make them much more reliable.

We will be addressing this further in the next couple of weeks – keep tuned!

Comments RSS
  1. rhorn said:

    Automation is the keyword here.  A lot of the unsexy stuff is unsexy because it's not automated well enough.

    OSS in general needs to be more gracious to CVS and GNU because they did the hard work to create the tools that automate coding, compiling, and versioning (specifically, GNU wrote autotools, and CVS was amongst the first well-written networkable versioning system).  Without them, I don't think OSS would be as big as it is today.

    As an example, check out KDE's Wiki (http://wiki.kde.org/).  It's got tons of great documentation on plenty of different KDE applications, but not a whole lot of it gets incorporated into the help system of KDE because no one has automated the process of converting between a wiki page and a KDE help page.

    Also, I think that wikis will be the key in the future to documenting a lot of OSS software.  I think a lot of people want to write documentation, but only in bits and pieces in relation to what other people have written.

    posted at 06:13PM 10/27/2006
  2. anandeep said:

    Rhorn,

    Good points. Often automating something is a challenge and is a project in itself. That might be one way of making sure that unsexy work gets done.

    Documentation may be a different from event logs - because you can write documentation independent of the executing code - it describes the code (kind of a meta-data).

    Event logs have to be closely linked to the code - since only the code knows when an event happen and only executing code has the parameters in memory that are useful diagnosis information. How do we get people in OSS to change the way they write code? Maybe the answer is automation in tools where the dev tool generates event stubs or something. But that would mean that all dev tools have that facility.

    I will be addressing why it is important to have those events in a forthcoming blogs and interviews etc.

    Anandeep

    posted at 10:47AM 10/28/2006
  3. rhorn said:

    Actually, I would argue that a lot of event logging gets done in OSS projects, but there's a number of problems with the way it's done in Linux and numerous other OSS projects:

    a. It's hard to sift through (all of my logs are in /var/log, but there's so many log files in there, it can be hard to figure out which log files to check).

    b. There's no consistency.

    What needs to happen is there need to be centralization--not just separation as there is right now.  Not only do logs need to be kept in separate files right now, but there also needs to be one central file that keeps some sense of sequitur-ness (to the extent that it's possible).  This would probably benefit from a new type of file system since it would be silly to keep duplicates of so many event logs.

    posted at 10:57PM 10/28/2006
Post a Comment
*
*