Why Not Comments

内容

September 10, 2024

Now available! It's a light release as I learn more about formatting a nice-looking book. You can see some of the differences between v2 and v3 here.

Code is written in a structured machine language, comments are written in an expressive human language. The "human language" bit makes comments more expressive and communicative than code. Code has a limited amount of something like human language contained in identifiers. "Comment the why, not the what" means to push as much information as possible into identifiers. Not all "what" can be embedded like this, but a lot can.

In recent years I see more people arguing that whys do not belong in comments either, that they can be embedded into LongFunctionNames or the names of test cases. Virtually all "self-documenting" codebases add documentation through the addition of identifiers.1

So what's something in the range of human expression that cannot be represented with more code?

Negative information, drawing attention to what's not there. The "why nots" of the system.

A Recent Example

This one comes from Logic for Programmers. For convoluted technical reasons the epub build wasn't translating math notation (\forall) into symbols (). I wrote a script to manually go through and replace tokens in math strings with unicode equivalents. The easiest way to do this is to call string = string.replace(old, new) for each one of the 16 math symbols I need to replace (some math strings have multiple symbols).

This is incredibly inefficient and I could instead do all 16 replacements in a single pass. But that would be a more complicated solution. So I did the simple way with a comment:

Does 16 passes over each string BUT there are only 25 math strings in the book so far and most are <5>

You can think of this as a "why I'm using slow code", but you can also think of it as "why not fast code". It's calling attention to something that's not there.

Why the comment

If the slow code isn't causing any problems, why have a comment at all?

Well first of all the code might be a problem later. If a future version of LfP has hundreds of math strings instead of a couple dozen then this build step will bottleneck the whole build. Good to lay a signpost now so I know exactly what to fix later.

But even if the code is fine forever, the comment still does something important: it shows I'm aware of the tradeoff. Say I come back to my project two years from now, open epub_math_fixer.py and see my terrible slow code. I ask "why did I write something so terrible?" Was it inexperience, time crunch, or just a random mistake?

The negative comment tells me that I knew this was slow code, looked into the alternatives, and decided against optimizing. I don't have to spend a bunch of time reinvestigating only to come to the same conclusion.

Why this can't be self-documented

When I was first playing with this idea, someone told me that my negative comment isn't necessary, just name the function RunFewerTimesSlowerAndSimplerAlgorithmAfterConsideringTradeOffs. Aside from the issues of being long, not explaining the tradeoffs, and that I'd have to change it everywhere if I ever optimize the code... This would make the code less self-documenting. It doesn't tell you what the function actually does.

The core problem is that function and variable identifiers can only contain one clause of information. I can't store "what the function does" and "what tradeoffs it makes" in the same identifier.

What about replacing the comment with a test. I guess you could make a test that greps for math blocks in the book and fails if there's more than 80? But that's not testing EpubMathFixer directly. There's nothing in the function itself you can hook into.

That's the fundamental problem with self-documenting negative information. "Self-documentation" rides along with written code, and so describes what the code is doing. Negative information is about what the code is not doing.

End of newsletter speculation

I wonder if you can think of "why not" comments as a case of counterfactuals. If so, are "abstractions of human communication" impossible to self-document in general? Can you self-document an analogy? Uncertainty? An ethical claim?

If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

My new book, Logic for Programmers_, is now in early access! Get it here._

总结
The article discusses the importance of comments in code, particularly focusing on the concept of 'negative information'—what is not present in the code. It highlights the author's experience with a slow code solution for replacing math symbols in an eBook, where a comment was added to explain the inefficiency. The author argues that while self-documenting code is valuable, it often fails to convey trade-offs and negative aspects of decisions made during coding. Identifiers can only encapsulate one piece of information, making it difficult to express both what a function does and the reasoning behind its inefficiencies. The author suggests that comments serve as essential signposts for future reference, helping developers understand their past decisions. The article concludes with a reflection on whether certain aspects of human communication, like counterfactuals or ethical claims, can ever be fully self-documented in code. Overall, it emphasizes the necessity of comments to provide context and understanding beyond what code alone can convey.