September 10, 2024
Now available! It's a light release as I learn more about formatting a nice-looking book. You can see some of the differences between v2 and v3 here.
Code is written in a structured machine language, comments are written in an expressive human language. The "human language" bit makes comments more expressive and communicative than code. Code has a limited amount of something like human language contained in identifiers. "Comment the why, not the what" means to push as much information as possible into identifiers. Not all "what" can be embedded like this, but a lot can.
In recent years I see more people arguing that whys do not belong in comments either, that they can be embedded into LongFunctionNames
or the names of test cases. Virtually all "self-documenting" codebases add documentation through the addition of identifiers.1
So what's something in the range of human expression that cannot be represented with more code?
Negative information, drawing attention to what's not there. The "why nots" of the system.
A Recent Example
This one comes from Logic for Programmers. For convoluted technical reasons the epub build wasn't translating math notation (\forall
) into symbols (∀
). I wrote a script to manually go through and replace tokens in math strings with unicode equivalents. The easiest way to do this is to call string = string.replace(old, new)
for each one of the 16 math symbols I need to replace (some math strings have multiple symbols).
This is incredibly inefficient and I could instead do all 16 replacements in a single pass. But that would be a more complicated solution. So I did the simple way with a comment:
Does 16 passes over each string BUT there are only 25 math strings in the book so far and most are <5>
You can think of this as a "why I'm using slow code", but you can also think of it as "why not fast code". It's calling attention to something that's not there.
Why the comment
If the slow code isn't causing any problems, why have a comment at all?
Well first of all the code might be a problem later. If a future version of LfP has hundreds of math strings instead of a couple dozen then this build step will bottleneck the whole build. Good to lay a signpost now so I know exactly what to fix later.
But even if the code is fine forever, the comment still does something important: it shows I'm aware of the tradeoff. Say I come back to my project two years from now, open epub_math_fixer.py
and see my terrible slow code. I ask "why did I write something so terrible?" Was it inexperience, time crunch, or just a random mistake?
The negative comment tells me that I knew this was slow code, looked into the alternatives, and decided against optimizing. I don't have to spend a bunch of time reinvestigating only to come to the same conclusion.
Why this can't be self-documented
When I was first playing with this idea, someone told me that my negative comment isn't necessary, just name the function RunFewerTimesSlowerAndSimplerAlgorithmAfterConsideringTradeOffs
. Aside from the issues of being long, not explaining the tradeoffs, and that I'd have to change it everywhere if I ever optimize the code... This would make the code less self-documenting. It doesn't tell you what the function actually does.
The core problem is that function and variable identifiers can only contain one clause of information. I can't store "what the function does" and "what tradeoffs it makes" in the same identifier.
What about replacing the comment with a test. I guess you could make a test that greps for math blocks in the book and fails if there's more than 80? But that's not testing EpubMathFixer
directly. There's nothing in the function itself you can hook into.
That's the fundamental problem with self-documenting negative information. "Self-documentation" rides along with written code, and so describes what the code is doing. Negative information is about what the code is not doing.
End of newsletter speculation
I wonder if you can think of "why not" comments as a case of counterfactuals. If so, are "abstractions of human communication" impossible to self-document in general? Can you self-document an analogy? Uncertainty? An ethical claim?
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers_, is now in early access! Get it here._