Dan Wood: The Eponymous Weblog (Archives)

Dan Wood Dan Wood is co-owner of Karelia Software, creating programs for the Macintosh computer. He is the father of two kids, lives in the Bay Area of California USA, and prefers bicycles to cars. This site is his older weblog, which mostly covers geeky topics like Macs and Mac Programming. Go visit the current blog here.

Useful Tidbits and Egotistical Musings from Dan Wood

Categories: Business · Mac OS X · Cocoa Programming · General · All Categories

Sat, 30 Aug 2008

Clang's Empirical Demonstration of a Need for Defensive Coding Guidelines

At the job that I had before starting Karelia Software, around the turn of the century, our software engineering department had a fairly intense set of coding guidelines. Not just the formatting rules that people had to follow (e.g. no K&R braces, indentation rules, etc.) but also many guidelines that were about programming defensively. The idea was that you had to write code that was less likely to have errors in it, and was less likely to have errors introduced later on.

Although the guidelines were originally written for C++ (for old-school Mac development), a lot of the rules transferred easily to Java (as did the many of the developers!), and then, with me, when I went "indie," to Objective-C.

I don't have a copy of that document, alas, but I did manage to internalize many of those rules, so that it became habit for me (for the most part), and I've been able to add on to some of those rules to be specific to Objective-C/Cocoa programming.

A few days ago, I found out about the LLVM/Clang Static Analyzer — "Clang" for short — from a number of developers on Twitter (where I am "danwood", BTW, and a lot more active there than on this blog these days). I ran our codebase (Sandvox, along with the iMedia Browser and other bits of code) through the analyzer, and what I found was very interesting: Most of the bugs that it found, if we had better applied the guidelines we've been trying to follow, would not have been there. I consider this proof of the utility of these kinds of guidelines, which I'll list later on here.

Clang, at this early stage in its development, is able to catch several kinds of bugs, including the following:

Memory Leaks
Clang is great at finding places where you have allocated (or copied) an object and neglected to release it; it even follows branches in your code to find that corner case where memory leaks. Most of these can be fixed with a CFRelease or release or autorelease, though I'll explain which one I prefer later on.
Dead Store
If you store a value to a local variable, but then never make use of that variable, it's not a big deal. However, in a lot of cases this may actually point out a flaw in your logic; why were you saving a value away and then not using it?
Use After Release
This can be pretty nasty if your code encounters it, so it's good to find places where you are making use of a variable that you have already released. Can you say "Zombie"?
Bad Receiver
If you have a variable that is not initialized, and then you send a message to it, you could crash your program.
Null Dereference
If you reference a pointer that is nil, this is a bad thing.
Missing Dealloc
This test in Clang isn't quite ready for prime time yet, but it's a great concept. Ideally, if you have a class with instance variables, they should be released in the dealloc method. I look forward to this being fully implemented because this is one area where I find myself making a lot of mistakes; I haven't figured out any kind of way to code defensively when it comes to getting instance variables deallocated.

(There may be other conditions that Clang catches; I just haven't run across them yet. And I'm sure the team is working on some other cool things to analyze!)

So now that we have this tool available, it's great that it can help clean up our mess for us, but that's no excuse for sloppy coding. It's still a lot easier just to get it right the first time. So what are some ways you can program defensively?

Before I get into a list, I should point out that you have to write code with this mindset: that someday, sooner or later, somebody — you or somebody else — is going to touch your code. If it's somebody else, they are not going to know what you know right now as you type in code. And if it's future-you, you aren't going to remember what is in your head right now. So don't try to be clever. Whoever it is, they are going to have to understand what you were doing, and then they are going to do something to change the code which you thought you had made just perfectly. Don't make it easy for them to mess something up.

For the "understanding" part, it obviously makes sense to use clear naming and comment your code. We've all heard that, and maybe we even follow those guidelines occasionally. Well, do that stuff more.

The "changing" part is a lot more dangerous. Somebody might go in and put an autorelease pool around the outside of your algorithm to deal with memory management. Or maybe they will find that they have to insert a line of code in there. Or they will rearrange the order of the lines for some reason. You have to write your code with that in mind, because you want it to be hard for that person to make mistakes.

Without further ado, here are some guidelines that I try to follow. Some of these, but not all, are directly related to issues that Clang has caught. (If you have any of your own to suggest, add them to the comments please!)

Manage your memory
Use autorelease for crying out loud. Yes, it seems more efficient to just allocate your object, do something with it, and then release it below. Easy, right? Except that time after time in our code, Clang found cases where this didn't work. Maybe there was a break in a loop in a certain branch that prevented the release from being reached, or there was a return under unusual conditions that exited the method without cleaning up. And most of the time, somebody just forgot to put in the release. So just get in the habit of autoreleasing all the time. (If you have to work with Core Foundation objects, cast and then autorelease those too if you can!) Don't prematurely optimize; you can always put autorelease pool into action if memory management is an issue. And if you really have to explicitly retain and release something (such as the autorelease pool itself), be sure to boldly comment what you are up to; following the next guideline below will also help avoid problems.
Avoid early return
It's harder to look at a block of code riddled with return statements and figure out the flow of things, and therefore it's easier to make mistakes. Clang proved this quite definitively! If you have cleanup to do (such as releasing an autorelease pool), this will prevent your cleanup from happening. It's just a lot clearer if you let your code flow to the end of the method where you can return the result that you have stored into a variable. (I tend to always have a "result" variable.) Remember, your code may be fine now, but somebody may come along and change things, and not notice all of your early return statements and clean things up properly.
Initialize your local variables.
When you declare a local variable, initialize it to something safe. Yes, you may get a few "dead store" warnings from Clang, but it's better than to get "bad receiver" warnings — or worse — crashes at runtime. If somebody comes along and modifies your code later, your uninitialized variable could come back to haunt you.
Release using setters
You can avoid zombies if you get in the habit of releasing your instance variables only in their setters. If you call [foo release] in your method, there's a chance that somebody may later try to make use of foo and cause bad things to happen. If you call a method like [self setFoo:nil] then you're safer. Clang found a few cases where we didn't do this; if we had then this problem would not have happened.
Use your return codes
Methods or functions that return something to you are probably returning something useful, so it's not a good idea to just ignore a result, especially if it could help you track down a problem. You can also tell what happened if you are stepping through your code with a debugger. If you really don't need the result of calling something, cast its return value to (void) so it's clear that you intended to ignore the result.
Always use { } braces in nested blocks
Sure, it's possible to put just one statement after an if or while... but you are just asking for trouble. That person who modifies your code in the future may want to insert another statement in that block of code, not notice that there weren't braces there. Your program logic could get really screwed up. It's better just to put the braces there in the first place, even if it's just a single line in between. (Don't expect Clang to catch this kind of problem; unless it starts analyzing the indentation levels of your source files!)
Put constants on the left side of the ==
This is a good habit in order to avoid the mistake of only a single = when you meant the double ==. A constant to the left of a single = will alert you right away.

There you have it! Check out Clang and see what it can find in your own code, and try to think of how you can change your habits so that those kinds of problems won't crop up again.