Learning to Read! - Port 25: The Open Source Community at Microsoft
< Back to Blogs
Learning to Read! by anandeep on November 16, 2006 07:39PM

One of the great things about my job at the Open Source Software Lab (OSSL) here at Microsoft (besides being able to work with both Linux and Windows!) is that I get to go computer science research conferences.  I try not to attend the purely academic ones, but the ones in which both industry and academic research issues are addressed. 

I just got back from ISSRE  (pronounced “is-ree”) i.e. the 17th IEEE International Symposium on Software Reliability Engineering, 2006.  This conference talks about everything that impacts the reliability of computers – this includes everything from “drivers of reliability” to  “testing to ensure reliability” to “doing static analysis of programs”.

Skeptical that anything they talk about here would be useful to y’all? Well, think again! They have all kinds of practical advice on doing things right.  The talks I really enjoyed included

  • empirical evidence on the positive impact of using assertions during development (a very development oriented talk)
  • considering stabilization time of an application (i.e. stability during installation and immediately after) as part of a reliability metric
  • the use of “operational profiles” to reduce the number of test cases by over 30% without significantly impacting the reliability of the tested product

Only one of the above talks was from an academic institution, the other two were based on experience with software being widely used in the consumer and application server space.

The one thing that I enjoyed the most was a tutorial on “Software Productivity and Reliability – Tools and Techniques” given by Prof S C Kothari  of  Iowa State University.  The tutorial title is appropriate but I think what it should have been is “Learn to Read Programs Properly!”

Kothari believes that a lot of attention has been paid to what he calls “Program Writing” – developers tools and such.  This has resulted in the creation of very complex software artifacts.  Most real world applications today are built on these already built complex software systems. 

The problem is that almost all academic institutions and programs focus on the inventive aspects of programming.  This means that they teach algorithms and techniques assuming that everything will be written from scratch.  Real life is of course never like this – it is difficult if not impossible to be a computer software professional these days and work just with your own code.  More often than not, most developers have to wade through other people’s code to understand, use or modify it.  Developing software today involves a lot more than just writing it.

The skills to “read programs” are acquired the hard way – and sometimes never fully mastered.  Kothari suggested that there needs to be an emphasis on program reading in training and that tools need to be built to aid in reading programs and forming the proper mental model of them.  The barrier to future software productivity is not machines or algorithms but human mastery of the complexity of the vast amount of critical software out there.  

Program reading is not easy, as most people in open source know! This is due to

  1. The complexity of the semantic analysis of the program – figuring out what the module is trying to do, is it part of the scaffolding put in place to support the execution or is it domain knowledge embodied in the module?
  2. A lack of domain knowledge – how would a programmer know how a complex business or legal transaction needs to be done or a certain application level protocol executed– yet this information has to be embodied in the code written by the programmer.
  3. Non-localized relations between software artifacts – the module does one thing in one context and another thing in the other – in some it has to maintain data integrity and in others  it has to undo something that has occurred somewhere else in the program.

There are some tools that are available to assist in program reading such as CScope (BTW Hank Janssen of our lab wrote parts of CScope) but there has not been a lot of attention paid to WHAT program reading needs in order to address the complexity issues raised above.  Kothari has a company Ensoft that provides some very cool tools to do the kinds of things that are needed for reading complex programs.  The tools are based on abstractions that are used in program comprehension (there is a IEEE Conference on Program Comprehension held every year).  Kothari illustrated one that he called “matching pair” (MP).  Matching pairs are defined by a syntactic pattern – which could be artifacts (such as matching parentheses) or events ( such as locking or unlocking a resource).  There are many types of such matching pairs and to make a program correct a matching pair can be defined with respect to control flow, data flow or both.  A control flow matching pair  means that a function f would need to be followed by a function f-inverse in EVERY execution path that the program could take.  Looking through every execution path is hard (and it is proven that to do it via automated static analysis of programs is an intractable problem) – especially in something like the Linux kernel.

Using the tool that Kothari demonstrated – a call graph was generated and a “query language” defined over call graphs.  Looking for matching pairs using the tool became unbelievably simple. This was just one of the things that can be done to reduce the complexity and time taken to figure out what a very complex program was doing.

I think this is a real breakthrough – and I am now a confirmed advocate of program reading. I am hoping to work with Prof Kothari to do some more stuff with this – I hope to share the results if I do end up doing that.

Why do I mention this on this forum?  This is something that open source developers and IT Pros have been doing for a long time.  Open source developers have a culture wherein a lot of code reading is encouraged. And IT Pro’s have to constantly update and upgrade scripts that they use to control and run their infrastructure. The cultural advantage lies with open source developers and IT Pros but given the complexity of software is increasing exponentially everyone could do with a little help

Comments RSS
  1. This topic - code reading - is mentioned in at least two of the major computer science textbooks in my personal library - Tanenbaum's "Operating Systems: Design and Implementation" and Aho, Sethi, and Ullman's "Compilers: Principles, Techniques, and Tools".

    Both writers were writing at a stage when the commercial culture of source code secrecy had almost succeeded in wiping out code reading in tertiary institutions.  And both of the books regret this.

    As far as training, etc, goes, a little bit of code reading goes quite a long way.  For a newbie, seeing what such-and-such a snippet of code does, gets him or her in the frame of mind where it is possible, therefore achievable to write usable programs.

    Code reading to start maintaining a source tree that has been unused for a while - now that is a major challenge, particularly if it isn't well documented.

    Since that was the case with the Y2K bug, it should be more of a priority among programmers and other software professionals.

    posted at 08:36PM 11/20/2006
  2. anandeep said:

    Wesley,

    Thanks for your comments.

    I have actually read both the books you mention and have the OS book on my shelf in the office! The other one is a prized possession in my home library.

    It is interesting that Tanenbaum's Minix was in the tradition of Linux. I remember reading the C code - but dont think I could have waded through it without the commentary in the book.

    There hasnt been much CS support for reducing complexity in reading code - techniques and tools that work havent seen the kind of attention that they deserve IMHO. That's why I was so excited to see Kothari's work.

    My theory is that the community aspect of open source reduces the complexity of code reading, since the forums and individuals (not to mention O'Reilly!)coach you on what to read. Reading code where there isn't this community around you would be a very onerous and complex task.

    posted at 12:33PM 11/21/2006
Post a Comment
*
*