Sometimes I want to use comments because I'm doing something vaguely algorithmic, and I know some readers won't follow the code.
I'm trying to think of a good example, maybe something like a pointer window based function (off the top of my head)
(This isn't real code. Don't get hung up on it)
func DedupeStrings(ss []string) []string {
if len(ss) < 2 {
return ss
}
ss = strings.Sort(ss)
u := 1 // index of end of "uniques" set
for i := 1; i < len(ss); i++ {
// Consume until new value
if ss[i] == ss[i-1] {
continue
}
// Put new value in 'uniques' set
ss[u] = sorted[i]
u++
}
// Final state: all unique items are positioned
// left of 'u' index
return ss[:u]
}
People will quibble, but
- I'm not convinced you could change the variable names without harming clarity. Would a name like uniquesEndIndex really be any clearer? It adds noise to the code and still doesn't satisfy a confused reader
- I don't want to use function calls for documentation, eg putInUniques(). I'm doing it this way because I want it to run really quick.
I'll be honest, this code is easier to read for me without the comments. Also sorting feels like it's going to be slower than having some kind of set structure? You don't need ordering, just collocation of duplicates. If not or if it's a wash, that is also a good thing to comment. Also I'm not sure about the semantics of Go but it seems this mutates the argument AND returns a value, something I consider dangerous.
Otherwise I agree, people have a weird hang up about short variable names. Somehow not a problem in mathematics...
There's probably a better example. The point is sometimes the What needs explanation, and finding a better What isn't practical.
I have slightly unorthodox opinions about short variables. I used to hate them. Then I posted a question on one of the PL design forums - it might have been Reddit r/programminglanguages - why is there are history of single letter names for type variables? ie T, U, etc for generics. The answer I got back, was, sometimes you want code to focus on structure rather than identities. That stuck with me, because it helped me understand why so much C code (including Linux) code uses similar naming practices. Names can lie, and sometimes expressing the structure is the absolute critical thing.
Back when I programmed in Haskell, I also had a similar question about the extremely terse variable names that pop up everywhere. I'd wonder, why is this "x" and "xs" instead of "item" and "items" or "businessName" and "businessNames" or whatever. Eventually I found this (paraphrased) answer that made it all click:
The specificity or abstractness of a (variable) name relates to the values that it can hold. So when you have a very abstract function whose inputs can be of almost any type, naming those inputs in an overly-specific manner is an exact inverse of the failure of giving an overly generic to name highly constrained parameter.
All this said, I do agree with your original take on the comments. I much prefer having human-readable explanations inline with anyhow non-trivial code. If nothing else, they really make it easier to correctly fix the code if a bug is found much later.
What is ss supposed to mean? Also, I only know what "u" means because of your "uniques" comment.
Those comments also don't really help me quickly understand the code. I'd do a small doc comment along the lines of "Removes repeated elements from the supplied list, returning the remaining items in original order"
I agree this is easy enough to follow but I'd like to quibble about something else:
Comments should answer the question why you are not using some kind of hash set and do a single pass over the data and why it's OK to reorder the strings. One could reasonable expect that Dedupe shows first occurrences in order.
I'm trying to think of a good example, maybe something like a pointer window based function (off the top of my head)
(This isn't real code. Don't get hung up on it)
People will quibble, but- I'm not convinced you could change the variable names without harming clarity. Would a name like uniquesEndIndex really be any clearer? It adds noise to the code and still doesn't satisfy a confused reader
- I don't want to use function calls for documentation, eg putInUniques(). I'm doing it this way because I want it to run really quick.